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^ (54) rule: PRODUCTION OF FUNCTIONAL HYBRID GENES AND PROTEINS 

00 

(57) Abstract: The invention relates to an improved method for creating gene and protein libraries, particularly random gene libraries 
^ encoding for hybrid proteins containing fragments from one or two parent proteins. The method may be used to make libraries of 
^ circularly permuted variants of genes encoding a single protein or hybrid proteins, especially single chain proteins. In addition, the 

invention can be used to make libraries for protein fragment complementation, in which the two fragments originate from one parent 
^ protein or from two different parent proteins. The method can produce a library of genes mostly of the correct size, leading to a 

high traction of functional hybrids or complements. When coupled witli suitable screening or selection, the metliod can be used to 
Q create and identify hybrid proteins, including new proteins with new or altered properties. The invention also provides libraries of 

hybrid proteins, especially single-chain proteins, that include an N-terminal sequence originating from one parent protein, fused to 
^ a C-terminal sequence of a second parent protein, with both sequences varying in length among the hybrids of the library. 
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Production of Functional Hybrid Genes and Proteins 



FIELD OF THE INVENTION 

This invention relates to methods for creating novel DNA and amino acid sequences, 
especially the production of gene libraries encoding polypeptides or proteins, and 
corresponding protein libraries. Libraries can be made for individual proteins, or for hybrid 
proteins comprised of fragments from different proteins. The method can also be used to 
make random circular permutations of a protein. Random protein fragments can be made that 
complement one another to form a fiinctional protein. Protein fragments that assemble to 
form a functional protein can be identified, for example by screening or selection. The 
method can also be used to create novel hybrid or chimeric proteins by assembling fragments 
taken from different parent proteins, or by creating circular permutations containing fragments 
from different parent proteins. For example, DNA libraries can be made which encode for 
hybrid or chimeric proteins of an N-terminal part originating from one protein fiised to a C- 
terminal part of another protein. Screening or selection of the resultmg library can be used 
to identify proteins with useful properties. Genes encoding any useful protein or proteins can 
be used as parents, starting materials, or templates for the invention. This invention also 
relates to a method which may be used for the selection of gene library repertoires that have 
a continuous reading frame. 
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BACKGROUND OF THE INVENTION 

The publications and reference materials noted herein and listed in the appended 
Bibliography are each incorporated by reference in their entirety. 

New and useful proteins can be obtained in many ways, often by altering known 
5 proteins to obtain new or altered properties. One strategy to generate proteins with improved 
properties over existing, /. e. wild type, proteins is called directed evolution. For this purpose, 
DNA recombination techniques, including techniques known as "DNA shuflQing", have 
become powerful tools. In one technique, called bisection, it has been found that there are 
proteins which loierate being cut or synthesized as separate fragments. The fragments can be 

10 reassembled in vitro or in vivo to yield functional proteins in the form of dimers. This 
method, called protein fragment complementation, is thought to rely on interchain packaging 
interactions between the protein fragments to restore biological function. {See Bibi and 
Kaback, 1990;BurbaumandSchimmel, 1991; Hall and Frieden, 1989;HantganandTaniuchi, 
1977; Labhardt, 1982; Shiba and Schimmel, 1992; Taniuchi et al., 1977; and Yang and 

15 Schachman, 1993.) 

In a variation of these methods, a protein or polypeptide can be connected via the 
original amino (N) and carboxy (C) terminals and bisected to yield new molecules in a process 
known as circular permutation. (See Mullins et al., 1994; Protasova et al., 1994; Vignais et 
al., 1995; Yang and Schachman 1993; and Zhang et al, 1993.) Circular permutation 

20 reorganizes the primary sequence of the protein so that the original amino and carboxy 
terminals are covalently closed, and new terminals are created at a different site within the 
sequence. Covalent closure of the natural terminals can involve insertion of one or more 
amino acids, for example if the terminals are not close enough in space to be directly linked 
to each other. Proteins reorganized in this way may retain some or all of their original 

25 biological function and properties, and may have new functions or properties. 

Traditionally, cleavage or bisections sites, and the sites for new amino and carboxy 
terminals of circularly permuted proteins, are chosen based on some knowledge of the protein 
structure or behavior. Typical sites have included, for example, cleavage sites of limited 
proteolytic digestion, or regions of the protein thought to be flexible (e.g. loops). 

30 Graf and Schachman, 1 996, produced variants of aspartate transcarbamoylase (ATC) 

by random circular permutation, and also by constructmg a gene homodimer connected by a 
short linker sequence. Thereafter, the gene dimer was cut at a specific site by digestion with 
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a restriction endonuclease that recognizes a xinique site in the gene. After circularizing the 
obtained firagments by ligation, using the cohesive ends left by the digest, the fragments were 
randomly linearized by treatment with DNase 1. This approach created random circular 
permutations or permutants of one protein, but could not be used to create libraries for protein 
5 fragment complementation. 

Another method for creating a library of complementary protein fragments is 
suggested by Ostermeier et al., 1999. to this method, a library is made from two different 
parent proteins, to create heterodimers in which one fragment comes from a first parent and 
the other from a second parent. However, incremental gene truncation is used to create a 

1 0 library of fragment pairs which are largely useless or non-fimctional. Most fragments can not 
be combined to produce or sum up to a single complete protein. Also, a large fraction of the 
protein fragments will not be able to fold properly, or will not be able to dimerize. 

Various DNA shufiQing methods have been developed to produce protein libraries that 
are hybrids between two or more parent proteins. These include in vitro methods (See Shao 

15 etal.,1998;Stenuner 1 994; and Zhao etal., 1 998) and wvzva methods (S'^eOkkels, 1997;and 
Volkov, 1 999) These methods use protein monomers, and they require regions of high DNA 
sequence identity between parent proteins, generally at least 80%. Fragments obtained from 
more distant parents can not be recombined, for example because crossover will not occur. 
However, many evolutionarily related proteins have highly similar structures, and may have 

20 similar fimctions, but they do not share much or a high degree of DNA sequence identity. For 
example, it is not uncommon for proteins which have similar three-dimensional structures to 
have only 20-30% sequence identity. 

Accordingly, there is a need for combinatorial approaches to protein design which do 
not require homology or a high degree of sequence identity. Methods which do not depend 

25 on detailed knowledge of parent proteins would also be usefiil. Given the vast nxunbers of 
proteins about which little is known, there is a need for random or unbiased methods to 
identify cleavage sites that yield fimctional proteins when one or more parent proteins are 
permutated, or when protein fragments are made for complementation. 

Furthermore, there is a need for a method to recombine sequence elements from 

30 multiple proteins which do not require high levels of DNA sequence identity. Such a method 
could also be used to create a library of hybrid, or chimeric, proteins with fragments from two 
(or more) parent sequences. In particular, random methods which are capable of producing 
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relatively large numbers of hybrids relatively quickly, and from which molecules exhibiting 
desired characteristics can be identified by screening, would be advantageous. 

In addition, the current techniques for protein fragment complementation and circular 
permutation tend to generate a bisection of each polypeptide chain somewhere between the 
original N- and C- terminals of the protein. This may impair protein folding, and also 
influence or impair the ability of the resulting hybrid protein to function. {See Graf et al., 
1996; and Hennecke et al., 1999). Such a constraint may severely and unnecessarily limit the 
proportion of fimctional proteins in the library. Thus, there is a need for techniques preserving 
or recreating the original N- and C-terminal sequences of the parent protein(s). 

SUMMARY OF THE INVENTION 

The present invention provides an improved method for creating gene and protein 
libraries. The invention can be used to make random libraries of circularly permuted variants 
of genes encoding a single protein, or hybrid proteins containing fragments from two or more 
parent proteins. The invention can also be used to create a library for protein fragment 
complementation, in which fragments originate either from one protein or from different 
proteins, typically from two proteins. In one embodiment, hybrid proteins can be created from 
two parent proteins independent of sequence similarity between the parent proteins. In another 
embodiment, the invention may be used to create a library of genes which are mostly of a size 
appropriate for successfiil recombination into full-length proteins. This provides a 
significantly high likelihood that a relatively high fraction of complements in the library will 
be fimctional. In still another embodiment, the invention can be used to create a library of 
truncated or elongated genes from one or more parent genes. 

The invention also provides a method to create libraries of hybrid proteins, especially 
single-chain hybrids, that may have an N-terminal part originating from one protein fiised to 
a C-terminal part of a second protein, with both parts varying in length, while the total length 
is comparable to a parent protein. This technique is designed to further increase the fraction 
of fimctional proteins expressed or produced using hybrid genes in the library. Methods 
provided herein can also be used to create a random library of single-chain hybrid proteins that 
consist of fi:agments of several proteins. In addition, the method can be used to make libraries 
of proteins that have small interior sequence duplications or deletions of random length and 
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at random positions. Especially, the invention can produce libraries of hybrid proteins with 
preserved N- and C-terminal sequences. 

Thus, the invention provides a method which, particularly when coupled with 
screening or selection, can be used to create and identify new gene and protein Ubraries, 
5 including new proteins with useful properties. The basic strategy for creating these libraries 
involves manipulation of the DNA encoding the protein or proteins, followed by expression, 
either in vivo (e,g, in host cells) or in vitro. 

Random Fragments for Protein Fragment Complementation and Circular Permutation 

1 0 In one embodiment, a gene dimer is constructed as a homodimer or heterodimer, /, e. 

a polynucleotide is made from two genes or portions of genes that encode for the same protein 
or for different proteins. Typically, each dimer comprises two complete and non-identical 
genes, placed in tandem on a smgle piece of DNA, and separated by a linker sequence. The 
linker sequence encodes for at least one restriction site that is unique in the dimer construct. 

15 If so desired, gene concatemers can also be made. 

To construct a random circular permutation library, the linker sequence is preferably 
designed such that the reading frame is continuous, and the original 5' and 3 ' terminal ends of 
the structural gene are connected. Appropriate linker sequences can either insert, delete, or 
mutate amino acids in the protein sequence, or they can leave the protein sequence unchanged, 

20 except for covalent attachment of the N- and C-terminal amino acids. 

To construct a library for protein fragment complementation, the linker sequence 
should encode a stop translation signal of the upstream gene fragment of the dimer and a 
translation initiation signal of the downstream gene fragment of the dimer. 

The gene dimer can be constructed, for example, xising the polymerase chain reaction 

25 and subcloned into a suitable vector for amplification. The constructed gene dimer is then 
excised and purified after separation from other components of the mixture. The purified 
gene dimer is subjected to limited fragmentation, resulting in a mixture consisting of DNA 
fragments varying in size. From this mixture, fragments having a predetermined size, or being 
within a predetermined size range, can be isolated. In one approach, DNA fragments 

30 approximately the size of a gene monomer are isolated using any one of a range of techniques, 
including gel electrophoresis. The resulting DNA will consist of a population of DNA 
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molecules approximately the size of the parent gene or genes, but with different 5' and 3' 
termini. 

The purified DNA is then treated as necessary and ligated into a suitable expression 
plasmid to create a library of random circular permuted genes or proteins, or a library for 
5 protein fragment complementation. The expression plasmid can be used to transform a 
suitable host for expression of the proteins. The genes can also be expressed by phage display 
(Johansson et al., 1999) or in vitro transcription-translation systems. Functional circular 
permutants or complementary fragments that yield functional protein are identified by 
screening or selection. Optionally, the repertoire of hybrid variants whose parental fragments 
10 are in one continuous reading frame may be increased by ligating the N-terminals of fragments 
in the gene library to a gene encoding for a reporter protein. Preferably, the start codon of 
translation (ATG) of this reporter protein has been modified (or removed) to prevent its 
independent translation. 

15 Hybrid proteins with preserved terminal sequences 

The invention also provides improved methods for creating functional hybrid or 
chimeric proteins from two or more parent proteins, by preserving the N- and C- terminals of 
the original protein or proteins, or by providing terminal ends which are appropriate for, or 
compatible with, the proteins. This includes, for example, terminals which promote functional 

20 protein folding, and is particularly useful for proteins which are sensitive to alternations in the 
C- or N- terminal, or which are sensitive to folding conditions associated with one or both 
terminal ends. To facilitate this method, gene dimers can be made with linkers which 
preferably have at least two unique restriction sites. 

In one embodiment, randomly generated gene monomer-length DNA-fragments are 

25 circularized by ligating the 3 '-end of the truncated gene (the second gene of the dimer) to the 

5*-end of the truncated gene (the first gene of the dimer). This procedure results in the fusion 
of the corresponding new C-terminus of the second protein vdth the new N-terminus of the 
first protein. After digestion of the circular DNA fragments with, e.g., restriction enzymes 
that cut within the linker sequence, amplification by PCR when appropriate, and ligation into 

30 a corresponding expression vector, the resulting hybrid proteins maintain the original N- 
terminus of the second protein and the C-terminus of the first protein. They also contain 
intervening covalent crossovers between the two proteins. 



wo 01/30998 



-7- 



PCTAUSOO/29717 



In another embodiment, gene concatemers can be constructed from the same, or 
several different, parent genes. After one or more additional cycles of random fragmentation, 
selection, and circularization, a gene or protein library can be obtained which corresponds to 
hybrid proteins consisting of several different fragments of the parent protein(s). 

The invention thus provides for methods to modify chemical, physical and/or 
functional properties of a protein by creating a hybrid between the protein and another protein 
having different properties. For example, one property residing in the N-terminal of one 
protein may be combined with a property residing in the C-terminal of another protein, and 
a hybrid protein created which fully or partially retains desirable properties of the parent 
proteins. 

The above features and many other advantages of the invention will become better 
understood by reference to the following detailed description when taken in conjunction with 
the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 A shows a schematic description of the construction of a gene dimer. Each 
dimer can be a homodimer or a heterodimer. 

FIG. IB shows a strategy for constructing a library of gene fragments corresponding 
in size to the size of the gene or genes that encode for the original protein(s). 

FIG. 2 A shows restriction sites for the digestion of two isolated plasmids from active 
clones of Green Fluorescent Protein (GFP). 

FIG. 2B shows results of inserts deduced from the double enzyme digestion of 
isolated plasmids from active clones of Green Fluorescent Protein. The double enzyme 
digestion consisted of JBamHI+£coRI,5a;MHI+A7ioI, andjBamHI+5/?I. Obtained fragments 
(insert types) are: (a) intact GFP gene with extra fragment upstream; (b) intact GFP with an 
extra fragment downstream; (c) two overlapped fragments; (d) recovered wild-type or wild- 
type-like genes; (e) complementary fragments; and (f) truncated genes. 
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FIG. 3 shows the construction of a gene heterodimer according to one embodiment 
of the invention. 

FIG. 4A and 4B show two possible strategies for constructing a library of gene 
5 fragments, using heterodimers of FIG. 3, to obtain hybrid genes corresponding in size to the 
size of the genes that encode the original proteins. 

FIG. 5 shows a strategy for constructing a library of hybrid proteins with one 
crossover between the two parent proteins, or with small interior sequence deletions or 
1 0 duplications. "X" designates the position of the crossover between the two proteins. 

FIG. 6 shows one strategy for constructing a library of hybrid proteins with several 
crossover points between two or more parent proteins. "X" designates the position of the 
crossovers between the different proteins, 

15 

FIG. 7 shows another strategy for constructing a library of hybrid proteins with several 
crossover points between two or more parent proteins. 

FIG. 8 shows N-terminal nucleotide and amino acid sequences for two hybrid proteins 
20 constructed according to the invention. Sequences originating from BM3 are in bold letters 
and sequences originating from 1 A2 are in italic letters. 

FIG. 9 shows the nucleotide sequence for human cytochrome P450 IA2 having a 
modified N-terminus (See Fischer et al., 1992) [SEQ ID NO: 27]. 

25 

FIG. 10 shows the nucleotide sequence for the heme domain of mutant P450 BM3 
(See Schwaneberg et al., 1 999) [SEQ ID NO: 28] . 

FIG. 1 1 shows the nucleotide sequence for a hybrid gene of the invention (RC 1 ) (SEQ 
30 ID NO: 29]. 
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FIG. 1 2 shows the nucleotide sequence for a hybrid gene of the invention (RC2) [SEQ 
ID NO: 30). 

FIG. 13 shows the nucleotide sequence for a hybrid gene of the invention (RC3) [SEQ 
ID NO: 31). 

FIG. 14 shows the nucleotide sequence for a hybrid gene of the invention (RC4) (SEQ 
ID NO: 32). 

FIG. 15 shows the nucleotide sequence for a hybrid gene of the invention (RC5) [SEQ 
ID NO: 331. 

DETAILED DESCRIPTION OF THE INVENTION 

The object of this invention is to provide improved methods for creating novel protein 
sequences, including the production of libraries of genes encoding for polypeptides or 
proteins. These methods involve the creation of gene or protein libraries for single proteins, 
and for hybrid proteins which contain fragments from several different proteins. In a preferred 
embodiment, the protein library is constructed so that the N- and C-terminal ends of the 
protein are preserved. In particular, the method provides for the efficient creation of random 
or partially random libraries which can be screened for functional proteins. 

Definitions 

In any identified embodiments, the terms about, approximately, and variants thereof, 
means within 50%, preferably v^thin 25%, and more preferably within 1 0% of a given value 
or range. Alternatively, the term "about" means that the value is within an acceptable 
standard error of the mean, when considered by one of ordinary skill in the art. 

The term library, as used herein, means a collection of proteins, polypeptides or 
polynucleotides. A gene or DNA library is a collection of polynucleotides or DNA sequences, 
and generally includes polynucleotides or sequences that correspond to, are derived from, or 
are in some way related to one or more parent genes that can be expressed to produce one or 
more polypeptides or proteins. A protein library is a collection of polypeptides or amino acid 
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sequences that correspond to, are derived from, or are in some way related to one or more 
parent polypeptides or proteins, and may also encompass a corresponding gene library. 

A protein, polypeptide, polynucleotide or gene, may be native or wild-type, meaning 
that it occurs in nature; or it may be a hybrid, mutant, variant or modified, meaning that it has 
5 been made, altered, derived, or is in some way different or changed from a native protein or 
gene, or from another mutant. A hybrid gene or protein can also be called a chimeric gene or 
protein. 

A crossover is used to describe a point in a hybrid or chimeric polynucleotide or 
polypeptide sequence at which a section of the hybrid or chimeric polynucleotide or 
10 polypeptide sequence originating from one parent is connected to a section originating from 
another parent. 

A parent or template polynucleotide or gene, is any polynucleotide or gene from which 
any other polynucleotide or gene is derived or made, using any methods, tools or techniques, 
and whether or not the parent is itself a native or mutant polynucleotide or gene. Likewise, 

1 5 a parent or template polypeptide or protein is any polypeptide or protein from which any other 
polypeptide or protein is derived or made, using any methods, tools or techniques, and 
whether or not the parent is itself a native or mutant polypeptide or protein. 

The terms monomer, dimer, or polymer describe a polypeptide, polynucleotide, 
protein, or gene, in the form of one, two, or several components, or "mers", respectively. 

20 Further, a "homodimer" may be a polypeptide, polynucleotide, protein, or gene, made from 
two components originating from the same parent polypeptide, polynucleotide, protein, or 
gene, in native or modified form. A "heterodimer'* is a polypeptide, polynucleotide, protein, 
or gene made from two components originating from different parents, each of which encodes 
or corresponds to all or part of a different protein, native or modified. The term gene 

25 "concatemer" herein is a polynucleotide consisting of several genes or gene fragments, from 
one or more parent genes, in sequence with or without linker DNA in between each fragment. 

The term fragment means any part of a larger whole, including any rearrangement of 
parts which make up the whole. This includes polypeptide sequences obtained from, or 
corresponding to, all or part of the amino acid sequence of a functional protein. The term 

30 fragment also includes polynucleotide sequences obtained from or corresponding to all or part 
of the nucleotide sequence of a gene. For example, in the molecular cloning of a gene from 
genomic DNA, DNA fragments are generated, some of which will encode the desired gene. 
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The DNA may be restricted, i. e. , cleaved or cut, into fragments at specific sites using various 
restriction enzymes. Any suitable restriction enzyme may be used, including, but not limited 
to, Xhol, EcoU, Pstl, Sad, Hind m, Stul, Xbal, Bamm, Sail, and Mfel, Alternatively, one 
may use DNase in the presence of manganese to fragment or digest the DNA, or the DNA can 
5 be physically sheared, as for example, by sonication. DNA fragments can then be separated 
according to size by standard techniques, including but not limited to, agarose and 
polyacrylamide gel electrophoresis and colunm chromatography. 

A limited treatment or digestion of DNA means to treat or digest DNA under such 
conditions that a substantial portion of the treated or digested DNA fragments are 

1 0 approximately of a predetermined size, or approximately within a predetermined size range. 
The degree of digestion can be controlled, e.g. , by limiting the time of the treatment/digestion 
process, or by altering the treatment/digestion conditions so as to slow down or limit the DNA 
fragmentation. The optimal processing time and/or conditions to achieve the desired degree 
of DNA fragmentation are advantageously determined experimentally for each specific 

1 5 treatment/digestion. 

A polypeptide (one or more peptides) or protein is a chain of chemical building blocks 
called amino acids that are linked together by chemical bonds called peptide bonds. 

The properties of a polypeptide or protein include chemical, physical, or frmctional 
properties, which may be derived from characteristics such as amino acid composition and 

20 peptide chain folding. Chemical and physical properties are represented by, e.g., charge, 
isoelectric point (IP), water solubility, cell membrane solubility and/or binding, 
hydrophobicity, hydrophilicity, lipophobicity, lipophilicity, size, and stability. Functional 
properties of a protein or enzyme include, but are not limited to, foldability {i.e., the ability 
of the enzyme to fold in the desired manner), expressability (/. e, , the ability of the enzyme to 

25 be expressed in the desired manner and/or amoimt), the specific reaction catalyzed, substrate 
specificity, reaction product, and enzyme activity. 

A membrane-associated protein or a polypeptide is a protein or polypeptide which can 
have least one part of its polypeptide chain integrated or associated with a cell membrane. 
DNA (deoxyribonucleic acid) means any chain or sequence of the chemical building 

30 blocks adenine (A), guanine (G), cytosine (C) and thymine (T), called nucleotide bases, that 
are linked together on a deoxyribose sugar backbone. DNA can have one strand of nucleotide 
bases, or two complimentary strands which may form a double helix structure. 
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A polynucleotide, nucleotide sequence or oligonucleotide is a series of nucleotide 
bases (also called "nucleotides") in DNA, and means any chain of two or more nucleotides. 
A nucleotide sequence typically carries genetic information, including the information used 
by cellular machinery to make polypeptides, proteins and enzymes. These terms include 
5 double or single stranded genomic and cDNA, as well as any synthetic and genetically 
manipulated polynucleotide. The DNA and polynucleotides herein may be flanked by natural 
regulatory sequences, or may be associated with heterologous (non-native) sequences, 
including promoters, enhancers, response elements, signal sequences, polyadenylation 
sequences, introns, 5*- and 3 - non-coding regions, linker regions, sequences containing 

1 0 specific sites recognized by restriction enzymes, and the like. The nucleic acids in the present 
invention may also be modified by the many means known in the art. 

The single- or double-stranded polynucleotide sequences described herein may be 
multiplied or amplified by any means know in the art. One preferred technique is the 
polymerase chain reaction, or PCR. Generally, PCR involves the use of (1) one or more 

1 5 templates, which in this context relates to DNA sequences to be amplified; and (2) primers, 
which are DNA sequences, generally of limited length, which are specific for or 
complementary to regions of DNA. Primers may thus be used to, e.g., initiate DNA 
polymerization in vitro in the presence of DNA polymerase. When coupled to a reporter 
molecule such as a radionuclide or a fluorescent molecule, primers may also be used to 

20 identify whether a certain DNA segment contains a complementary sequence. If desired, 
error-prone PCR may be used to create variants or mutants of a template molecule. 

The single- or double-stranded polynucleotide sequences described herein may be 
ligatedy ie,, joined. For example, several DNA strands can be joined to one linear sequence, 
forming e.g. a gene dimer, concatemer, or the like. Also, a circular or circularized 

25 polynucleotide can be obtained when Ugating the ends of one single strand of DNA, a process 
which may also be referred to as circularization. The term linearization can be used to 
describe the formation of a linear or linearized sequence fi^om a circular sequence by, e.g., 
cutting the circular sequence with a restriction or other enzyme. Any methods known in the 
art may be used for DNA ligation. Ligation conditions may be designed to favor 

30 circularization over concatemerization, or the reverse, by e.g. choice of DNA concentration, 
or treating the ends of the DNA strands. An example of the latter is to convert staggered ends, 
having single-stranded cohesive ends, to blunt ends, or by treating the DNA strands with 
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suitable restriction enzymes. Further, the term ligation may also be used e.g. in a context 
describing the insertion of a gene into a vector, as described herein. 

Transcriptional and translational control sequences are DNA regulatory sequences, 
such as promoters, enhancers, terminators, and the like, that provide for the expression of a 
5 coding sequence in a host cell. In eukaryotic cells, polyadenylation signals are control 
sequences. A promoter or promoter sequence is a DNA regulatory region capable of binding 
RNA polymerase in a cell and initiating transcription of a downstream (3* direction) coding 
sequence. The promoter sequence is bounded at its 3' terminus by a transcription initiation site 
and extends upstream (5' direction) to include the minimimi nximber of bases or elements 

1 0 necessary to initiate transcription at levels detectable above background. Within the promoter 
sequence will be found a transcription initiation site (conveniently defined for example, by 
mapping with nuclease SI), as well as protein binding domains (consensus sequences) 
responsible for the binding of RNA polymerase. As described, promoter DNA is a DNA 
sequence which initiates, regulates, or otherwise mediates or controls the expression of the 

1 5 coding DNA. A promoter may be "inducible", meaning that it is influenced by the presence 
or amount of another compound (an "inducer**). For example, an inducible promoter includes 
those which initiate or increase the expression of a downstream coding sequence in the 
presence of a particular inducer compound. A "leaky" inducible promoter is a promoter that 
provides a high expression level in the presence of an inducer compound and a comparatively 

20 very low expression level, and at minimum a detectable expression level, in the absence of 
the inducer. 

A coding sequence or a sequence encoding a polypeptide, protein or enzyme is a 
nucleotide sequence that, when expressed, results in the production of that polypeptide, 
protein or enzyme, i.e., the nucleotide sequence encodes an amino acid sequence for that 

25 polypeptide, protein or enzyme. A coding sequence is under the control of transcriptional and 
translational control sequences in a cell when RNA polymerase transcribes the coding 
sequence into mRNA, which is then trans-RNA spliced and translated into the protein 
encoded by the coding sequence. Preferably, the coding sequence is a double-stranded DNA 
sequence which is transcribed and translated into a polypeptide in a cell in vitro or in vivo 

30 when placed under the control of appropriate regulatory sequences. The boundaries of the 
coding sequence are determined by a start codon at the 5' (amino) tenninus and a translation 
stop codon at the 3' (carboxyl) terminus. More than one stop codon can be used to terminate 
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the transcription of a DNA sequence. For example, to ensxire termination of transcription of 
a DNA segment that has been truncated at the 5* and/or 3* end, stop codons can be provided 
in all three reading frames proximal, le, near, the 3' end. A coding sequence can include, but 
is not limited to, prokaryotic sequences, cDNA from eukaryotic mRNA, genomic DNA 
5 sequences from eukaryotic (e.g., mammalian) DNA, and even synthetic DNA sequences. If 
the coding sequence is intended for expression in a eukaryotic cell, a polyadenylation signal 
and transcription termination sequence will usually be located 3' to the coding sequence. 

The term gene^ also called a structural gene means a DNA sequence that codes for or 
corresponds to a particular sequence of amino acids which comprise all or part of one or more 

10 proteins or enzymes, and may or may not include regulatory DNA sequences, such as 
promoter sequences, which determine for example the conditions under which the gene is 
expressed. A gene encoding a protein of the invention for use in an expression system, 
whedier genomic DNA or cDNA, can be isolated from any source, particularly from a human 
cDNA or genomic library. Methods for obtaining genes are well known in the art. (See e.g. 

1 5 Sambrooke et al., 1 989) Accordingly, any animal cell potentially can serve as the nucleic acid 
source for the molecular cloning of the gene of interest. The DNA may be obtained by 
standard procedures known in the art, such as from cloned DNA (e.g. , a DNA "library"), from 
cDNA library prepared from tissues with high level expression of the protein, by chemical 
synthesis, by cDNA cloning, or by the cloning of genomic DNA, or fragments thereof, 

20 purified from the desired eel I. Clones derived from genomic DNA may contain regulatory and 
intron DNA regions in addition to coding regions; clones derived from cDNA will not contain 
intron sequences. 

Proteins and enzymes are made in the host cell using mstructions in DNA and RNA, 
according to the genetic code. Generally, a DNA sequence having instructions for a particular 

25 protein or enzyme is transcribed into a corresponding sequence of RNA. The RNA sequence 
in turn is translated into the sequence of amino acids which form the protein or enzyme. 

The term reporter herein means any molecule, or a portion thereof, that is detectable, 
or measurable, for example, by optical detection. In addition, the reporter may associate or 
be associated with a molecule or a particular marker or characteristic of the molecule, or is 

30 itself detectable, to permit identification of the molecule or the presence or absence of a 
characteristic of the molecule. In the case of molecules such as polynucleotides such 
characteristics include size, molecular weight, the presence or absence of particular 
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constituents or moeties (such as particular nucleotide sequences or restrictions sites), and 
polypeptides which the reporter polynucleotide encodes. The term label can be used 
interchangeably with "reporter". The reporter is typically a dye, fluorescent, ultraviolet, or 
chemiluminescent agent, chromophore, or radiolabel, any of which may be detected with or 
without some kind of stimulatory event, e.g., fluoresce with or without a reagent. A reporter 
protein or polypeptide can be expressed from a reporter polynucleotide in vitro or in a cell, 
and such expression may be indicative of the presence of another protein that may or may not 
be coexpressed with the reporter. A reporter may also include any substance on or in a cell 
that causes a detectable reaction, for example by acting as a starting material, reactant or a 
catalyst for a reaction which produces a detectable product. 

An amino acid sequence is any chain of two or more amino acids. Each amino acid 
is represented m DNA or RNA by one or more triplets of nucleotides. Each triplet forms a 
codon, conesponding to an amino acid. For example, the amino acid lysine (Lys) can be 
coded by the nucleotide triplet or codon AAA or by the codon AAG. (The genetic code has 
some redundancy, also called degeneracy, meaning that most amino acids have more than one 
corresponding codon.) Because the nucleotides in DNA and RNA sequences are read in 
groups of three for protein production, it is important to begin reading the sequence at the 
correct amino acid, so that the correct triplets are read. The way that a nucleotide sequence 
is grouped into codons is called the reading frame. 

The terms express and expression mean allowing or causing the information in a gene 
or DNA sequence to become manifest, for example producing a protein by activating the 
cellular functions involved in transcription and translation of a corresponding gene or DNA 
sequence. A DNA sequence is expressed in or by a cell to form an "expression product" such 
as a protein. The expression product itself, e.g. the resulting protein, may also be said to be 
"expressed" by the cell. A polynucleotide or polypeptide is expressed recombinantly, for 
example, when it is expressed or produced in a foreign host cell under the control of a foreign 
or native promoter, or in a native host cell under the control of a foreign promoter. 

The terms vector, cloning vector and expression vector mean the vehicle by which a 
DNA or RNA sequence (e.g. a foreign gene) can be introduced into a host cell, so as to 
transform the host and promote expression (e.g. transcription and translation) of the 
introduced sequence. Vectors typically comprise the DNA of a transmissible agent, into 
which foreign DNA is inserted. A common way to insert one segment of DNA into another 
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segment of DNA involves the use of enzymes called restriction enzymes that cleave DNA at 
specific sites (specific groups of nucleotides) called restriction sites. Generally, foreign DNA 
is inserted at one or more restriction sites of the vector DNA, and then is carried by the vector 
into a host cell along with the transmissible vector DNA. A segment or sequence of DNA 
having inserted or added DNA, such as an expression vector, can also be called a "DNA 
construct." 

A common type of vector is aplasmid, which generally is a self-contained molecule 
of double-stranded DNA, that can readily accept additional (foreign) DNA and which can be 
readily introduced into a suitable host cell. A plasmid vector often contains coding DNA and 
promoter DNA and has one or more restriction sites suitable for inserting foreign DNA. 
Promoter DNA and coding DNA may be from the same gene or from different genes, and may 
be from the same or different organisms. A large number of vectors, includmg plasmid and 
fimgal vectors, have been described for replication and/or expression in a variety of eukaryotic 
and prokaryotic hosts. Recombinant cloning vectors will often include one or more 
replication systems for cloning or expression, one or more markers for selection in the host, 
e.g. antibiotic resistance. In general, the choice of vector depends on the size of the 
polynucleotide sequence and the host cell to be employed in the methods of this invention. 

The term host cell means any cell of any organism that is selected, modified, 
transformed, grown, or used or manipulated in any way, for the production of a substance by 
the cell, for example the expression by the cell of a gene, a DNA or RNA sequence, a protein 
or an enzyme. Appropriate host cells for expressing protein include bacteria, Archaebacteria, 
fimgi, especially yeast, and plant and animal cells, especially mammalian cells. Of particular 
interest are E. coli, B. subtilis, S. cerevisiae, SB cells, C129 cells, 293 cells, Neurospora, and 
CHO cells, COS cells, HeLa cells, and immortalized mammalian myeloid and lymphoid cell 
lines. 

The term expression system means a host cell and compatible vector under suitable 
conditions, e.g. for the expression of a protein coded for by foreign DNA carried by the 
vector and introduced to the host cell. Preferred expression systems include bacteria (e.g. E, 
coli and 5. subtilis) or yeast (e.g. 51 cerevisiae) host cells and plasmid vectors, and insect 
host cells and Baculovirus vectors. 

Isolation or purification of a polynucleotide, DNA fragment, polypeptide, or protem 
refers to the derivation of the polypeptide by removing it from its original environment (for 
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example, from its natural enviromnent if it is naturally occurring, or from the host cell if it is 
produced by recombinant DNA methods). Methods for polypeptide purification are well- 
known in the art, including, without limitation, preparative electrophoresis, isoelectric 
focusing, HPLC, reversed-phase HPLC, gel filtration, ion exchange and partition 
5 chromatography, and countercurrent distribution. A purified polynucleotide or polypeptide 
may contain less than about 50%, preferably less than about 75%, and most preferably less 
than about 90%, of the cellular components with which it was originally associated. A 
"substantially pure" enzyme indicates the highest degree of purity which can be achieved 
using conventional purification techniques known in the art. 

1 0 The terms sequence similarity or sequence identity refers to the difference between the 

amino acid sequence of a modified protein and that of the parent protein or enzyme, or the 
nucleotide sequence of a modified polynucleotide or gene and that of the parent 
polynucleotide or gene. The percent sequence identity or similarity between any two protein, 
amino acid, polynucleotide, or gene sequences can be detennined according to an alignment 

15 scheme, such as, e.g., the Cluster Method, wherein similarity/identity is based on the 
MEGALIGN algorithm. 

DNA shuffling is one approach to the creation of modified or hybrid proteins. For 
instance, a gene may be randomly firagmented and reassembled by error-prone PGR. After 
screening, the iterative process may be repeated until a protein with the desired properties is 

20 produced e.g., Stenuner, 1994). The term ''shuffling" herein means performing DNA 
shuffling, and includes various shuffling strategies, such as for example those described in 
Ness et al., 1999; Chang et al., 1999; Mmshull and Stemmer, 1999; Christians et al., 1999; 
Crameri et al., 1998; Crameri et al,, 1997; Zhang et al., 1997; Patten et al., 1997; Crameri et 
al. (1), 1996; Crameri etal, (2), 1996; Stemmer(l) 1994; Stemmer (2), 1994; U.S. PatentNo. 

25 5,605,793; U.S. PatentNo. 5,81 1,238; U.S. PatentNo. 5,830,721; U.S. PatentNo. 5,837,458; 
U.S. PatentNo. 5,965,408; WO 95/22625; WO 97/20078; WO 97/3 5966; WO 98/3 1 837; WO 
98/27230; WO 00/00632; WO 00/09679; WO 98/42832; WO 00/18906; EP 752008; and EP 
0932670. 

Protein fragment complementation means the mixing together of protein fi-agments 
30 to restore biological function. (See e.g. Bibi and Kaback, 1990; Burbaum and Schimmel, 
1991; Hall and Frieden, 1989; Hantgan and Taniuchi, 1977; Labhardt, 1982; Shiba and 
Schimmel, 1992; Taniuchi et al., 1977; and Yang and Schachman, 1993.) The fragments can 
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be obtained by, for instance, treating native or hybrid proteins with digestive enzymes or the 
like, or be expressed from native or modified gene fragments. The subsequent 
complementation in vitro or in vivo results in the conversion of monomers to dimers, or 
polymers. For example, this method is usefiil for proteins too large to be synthesized as 
5 monomers by current biochemical techniques. 

The term circular permutation herein means cleaving or bisecting a protein at one 
point and reconnecting it via the C- and N-termini to yield mutant proteins, including 
functional mutants of the parent protein (See e,g, Mullins et al., 1994; Protasova et al., 1994; 
Vignais et al., 1995; Yang and Schachman 1993; and Zhang et al., 1993.). In addition, 
1 0 circular permutation includes circularizing a gene, polynucleotide, or modified versions of the 
same, by connecting the 5' and 3* ends with or without a linker sequence, followed by cleavage 
at selected or random sites. The resulting modified gene or polynucleotide can then be used 
for the expression of a modified protein. 

15 Gene Libraries 

According to the invention, circular permutation and protein complementation 
techniques can be adapted to produce hybrid genes and functional mutant proteins. These 
techniques can be combined with tools adapted from DNA shuffling, directed evolution, and 
useftil screening methods, to produce gene and protein libraries containing functional mutants. 

20 In particular, the invention provides gene dimers, comprising two gene monomers joined by 
a polynucleotide linker. The invention also includes adaptations to the use of gene 
concatemers, i.e. constructs of more than two monomers, to create gene and protein libraries 
by techniques outlined herein. 

As outlined in FIG 1 A, a parent gene corresponding to a parent protein or polypeptide 

25 is selected. Any source of nucleic acid, preferably in purified form can be utilized as a starting 
material or parent gene of the invention. Nucleic acid sequences may be any length and of 
various lengths, although preferably the parent comprises a structural gene for a protein of 
interest, and is fix)m 50 to 50,000 base pairs. A duplicate gene is constructed by joining two 
genes, or monomers, to form a dimer. Each monomer may be identical to the parent gene or 

30 different from the parent gene, for example by modification of the nucleotide sequence. 
When both genes of the dimer are from the same parent, the resulting DNA construct can be 
called a homodimer. 
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Alternatively, two genes from two parents may be selected, each of which can be 
called a gene monomer. The two parent genes may encode related or unrelated proteins, 
including for example structurally or functionally related proteins from two different 
organisms. For example, a parent gene from one organism may encode a protein having a 
5 relatively high biological activity but relatively poor stability, A parent gene encoding a 
similar or related protein from another organism may encode a protein with less biological 
activity but greater stability. Also, the proteins encoded by the different parent genes may have 
different physical properties in terms of e.g. solubility, hydrophobicity, lipophilicity, or 
charge. These parent genes can be used in native or modified form. The monomers are then 

10 combined, according to the invention, to produce a gene dimer. This dimer, also called a 
heterodimer, can be used to generate a library of hybrids, including functional mutant proteins 
and chimeric proteins, some of which may combine the high biological activity and high 
stability of each respective parent, or display other desirable properties. 

As shown in FIG* 1 A, a first parent gene can be obtained having a structural gene 

1 5 flanked by an upstream primer 3 (containing a restriction site Rl ) region and a downstream 
primer 1 (containing a restriction site R3) region. A second parent gene, which can be the 
same as or different from the first parent, is flanked by an upstream primer 2 (containing a 
restriction site R3) region and a downstream primer 4 (containing a restriction site R2) region. 
The restriction site R3, which can be a native or an engineered site, is common to the 

20 downstream end of the first parent and the upstream end of the second parent. Thus, primers 
1 and 2 can be called "linker primers" or "linking primers", via the common R3 region. 

One advantageous feature of the invention is that a high degree of sequence similarity 
between two parent proteins is not a requirement. Accordingly, the sequence identity of two 
parent genes may be from 0-100%. In one embodiment, the sequence identity of two parent 

25 genes is 100%. In another embodiment, the sequence identity is less than 75%. In still 
another embodiment, the sequence identity is less than 50%, or even less than 30%. In a 
preferred embodiment, the sequence identity is between 15% and 50%, e,g., as determined by 
BLAST analysis {see, e,g., Altschul et al., 1990; Henikoff and Henikoff, 1992; or Karlin and 
Altschul, 1993). 

30 A sufficient amount of genes can be obtained, for example, by amplifying DNA 

containing the parent genes (including the primers and restriction sites) using polymerase 
chain reaction (PCR) techniques, followed by purification and analysis as necessary. The 
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amplified DNA products are restricted with specific restriction enzymes (Rl and R3 for the 
first parent; R2 and R3 for the second parent). The resulting DNA firagments are ligated by 
joining the downstream end of the first parent to the upstream end of the second parent (at the 
common R3 restriction site). A linker, having at least one restriction site that is unique in the 
dimer construct, is interposed between the first and second parent genes, as shown. The 
resulting DNA construct, a gene dimer of the two parents, is ligated, or subcloned, into a 
vector for fiirther amplification, e.g., by transformation into host cells, or by PGR The 
amplified DNA is then digested with restriction enzymes which excise the gene dimer by 
cutting at Rl and R2, thus provide quantities of the gene dimer. The dimer may then be 
purified and separated fi-om the other components, using methods known in the art. 

In one embodiment, suitable for producing a random circular permutation library, the 
linker sequence is preferably designed such that the reading fi^e is continuous and the 
original 5' and 3' (upstream and downstream) ends of the structural gene are connected. 
Appropriate linker sequences can encode to insert, delete, or mutate amino acids in the protein 
sequence, or they can leave the protein sequence unchanged, except that the N- and C- 
terminal ends of proteins encoded by the hybrid genes will be different fi-om the parent 
proteins. For another embodiment, e.g., to construct a library suitable for protein fragment 
complementation, the linker should preferably include a sequence to encode the stop 
translation signal of the upstream fragment {e.g. in linker primer 1 of the first parent), and a 
start of translation signal of the downstream fragment, (e.g, in linker primer 2 of the second 
parent). 

As shown in FIG, IB, the purified gene dimer is then cut or fragmented, for example 
by limited digestion with an enzyme such as, e.g., a nuclease, or DNase I, or by mechanical 
shearing forces such as sonication. DNA fragments of various sizes are generated in this way. 
Even if the dimer is cut at random sites, the type or relative degree of fragmentation can often 
be modulated in the chosen firagmentation technique, for example by time of exposure. 
Appropriate conditions for each chosen application may require individual optimization, 
based upon knowledge in the art. See, step 1 of FIG, IB. Using any suitable method, or 
combination of methods, for screening, isolating, separating, or purifying DNA, apopxilation 
of DNA pieces or fragments is selected. For example, a population of DNA fragments having 
a predetermined size, or being within a predetermined size range, can be selected and isolated. 
One possible technique is gel electrophoresis. See, step 2 of FIG. IB. Alternatively, fragments 
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can be made by random primer extension (See Shao et ai., 1998). If the resulting DNA 
fragments are too small, they can be subjected to limited overlap extension (See Stemmer 
1994) or StEP recombination (See Zhao et al., 1998) until they have reached the desired 
average size. 

5 Fragments made by any of the above mentioned, or other techniques, can be further 

separated, for instance according to size, in order to obtain a high fraction of pieces with the 
desired properties. In a prefened embodiment, the isolated fragments are within a 
predetermined size range encompassing, comparable to, or consistent with, about the size of 
a parent gene (e. g. , the selected fragments are the same or similar to a gene monomer in size). 

1 0 In another preferred embodiment, most of the isolated fragments are within the following size 
range: at least the size of about the smaller parent gene up to, and including, about the size of 
the larger parent gene (e.g., the size of the selected fragments is somewhere in-between the 
sizes of the two parent genes). Each fragment is likely to have different 5' and 3* ends, and, 
consequently, different intervening sequences. 

15 The DNA fragments obtained using these techniques comprise a gene library 

according to the invention. 

The purified DNA fragments are treated as necessary or desired, for example, with a 
DNA-modifying enzyme (e.g., a single strand specific nuclease, or a DNA polymerase such 
as T4 DNA polymerase) to convert staggered ends to blunt ends. See^ step 3 of FIG. IB. The 

20 DNA is then ligated into a suitable expression vector, typically a plasmid. See, step 4 of FIG, 
IB. The result is a plasmid, vector, or gene library of hybrid or permuted genes, or 
complementary fragments. The expression vector is designed so that gene length is controlled 
(stop codons are provided at all three reading frames). The presence of contaminants or 
undesired components, e.g. wild-type genes, in this library should be relatively low, but could 

25 be further reduced by optimizing the technique(s) used for amplifying and/or separating 
different components. 

The expression plasmids can be used to transform suitable host cells for expressing 
the proteins. The genes can also be expressed using techniques such as, for example, phage 
display (Johansson et al., 1999) and in vitro transcription-translation systems. The expressed 

30 proteins and polypeptides comprise a protein library of the invention. 

Briefly, the genes and proteins that are evolved using these methods can be rapidly 
screened. Functional hybrids, circular permutants or complementary fragments that yield 
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functional protein are identified by suitable screening or selection methods. When an 
expression system is used, functional proteins can be readily isolated and purified fi'om the 
expression system or firom the expression media if secreted by the host cells. For example, 
assays can be used to test fiinctional activity of the particular protein in native form, 
5 Optionally, in cases where the parental firagments are in one continuous reading firame, 

the number of hybrid variants can be reduced by, for example, ligating the N-terminals of the 
fragments in the gene library to a gene encoding for a suitable reporter protein whose start 
codon of translation (ATG) has been modified or removed to prevent its independent 
translation, 

10 In a prefeired embodiment, a gene library of the invention may be generated by a 

method comprising the steps of: (a) constructing a gene dimer containing a linker sequence; 
(b) performing limited digestion of the gene dimer to produce a population of fi:agments of 
varying sizes; (c) isolating DNA firagments of approximately the same size as a parent gene; 
and (d) inserting isolated DNA fragments into a suitable expression vector. 

15 

Preserved terminal sequences 

When constructing a library of hybrid proteins, it may be desirable that they retain the 
N- and C- terminal ends of the original proteins. Conventional protein fragment 
complementation and circular permutation generate a bisection of the polypeptide chain 

20 somewhere between the original N- and C- terminals, and create hybrids with new N- and C- 
terminal ends. This tends to reduce the number or fraction of functional hybrids in the library, 
because new terminal ends, mismatched sequences in the hybrid, or the cleavage into two 
separate polypeptide fragments, can impair the ability of the resulting proteins to fold 
properly. This, in turn, may have an impact on protein iunction (See Graf and Schachman, 

25 1993; and Hennecke et al., 1999). 

This problem has been solved in one embodiment of the invention. Libraries of hybrid 
proteins, especially single-chain proteins, can be made that have an N-terminal part 
originating from one protein and a C-terminal part of a second protein, with both parts varying 
in length. As described above, the sequence similarity between the two parent proteins may 

30 be in the range 0- 1 00%, since sequence similarity is not a requirement. A preferred, although 
not limiting, sequence identity of the genes encoding the parent proteins is in the range 15- 
50%. 
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To obtain monomeric hybrid proteins with matched terminal ends, a gene dimer, or 
concatemer, is made according to the strategy outlined in FIG, 3. This technique is similar 
to the one outlined in FIG. 1 A, except that in the current example, one of the parent genes has 
an additional restriction site in an upstream or downstream region, e,g, a non-coding sequence 
5 (shown in the figure as R5 of gene 2). 

With reference to FIG. 3, a gene construct is made, in this case a heterodimer 
comprising the genes of two different parent proteins, bi other applications of the invention, 
the parent proteins may originate from the same or different organisms, and may or may not 
exhibit different functional or physical properties. The two genes are placed in tandem on a 

1 0 single piece of DNA and are separated by a linker sequence. The linker sequence contains one 
or more, preferably two, restriction sites (as shown) that are unique in the dimer construct. 
The gene dimer can be constructed and amplified, for example, using PCR and is ligated or 
subcloned into a suitable cloning vector. After amplification, the constructed gene dimer is 
excised, and purified as necessary or desired. 

1 5 The gene dimer is fragmented (e.g., by limited digestion with an enzyme such as, e.g. , 

a nuclease, or DNase I, by sonication, or by random primer extension). These procedures, 
outlined in FIG. 4, are sunilar to those described above in connection with FIG. IB. A 
population of fragments is provided, and the resulting mixture of fragments is sorted, 
separated, or purified by size, for example using gel electrophoresis or other methods 

20 described herein. Preferably, the separation or sorting procedure selects a range of fragment 
sizes encompassing, or being at least comparable with, the size of about a parent gene 
monomer. If the DNA fragments are too "small, they can be subjected to limited overlap 
extension {See Stemmer 1 994) or StEP recombination (See Zhao et al. 1 998) until they are on 
average the approximate size of a gene monomer. Each of these fragments is likely to have 

25 unique 5' and 3* termini, as well as DNA sequence. 

An alternative way to produce DNA fragments with the approximate length of a gene 
monomer is to use Exonuclease III (See Henikoff 1 984). When linear DNA fragments having 
blunt ends or 5'-protruding smgle-strand overhangs are treated with Exonuclease EI, one 
nucleotide at a time is removed from the 3'-end. When a population of DNA fragments of a 

30 unique length is subjected to limited treatment with Exonuclease HI, the size distribution of 
the obtained truncated fi-agments follows a Poisson distribution. This distribution has a 
deviation of about 20 to 25 % of the average length of the removed DNA fragments (See 
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Hoheisel 1 993 ) . When a gene monomer has n nucleotides, the desired deviation should be n/2 
to obtain a libraiy of fragments with the DNA ends covering the entire length of each gene. 
Therefore, the average length of the DNA to be removed from either side of the gene fragment 
should be around 2n to 3«. 
5 It is therefore possible to put the gene dimer into a vector that is about twice (for both 

sides of the dimer) the size of two to three times the size of the gene monomer. The vector 
should have a unique restriction site opposite the cloning sites that were used to insert the 
gene dimer. This unique restriction site is used to linearize the DNA. The linear DNA is then 
digested with Exonuclease III, followed by a treatment with a single-strand-specific nuclease 

10 (e.g. Mung Bean Nuclease, Sl-Nuclease) so that the average size of the truncated DNA 
fragments is the size of the gene monomer. The SI -nuclease digest results in DNA fragments 
that are blunt-ended which is a requirement for the ligation procedure. The DNA fragments 
are then separated (e.g. on an agarose gel) and fragments which are approximately the size of 
the gene monomer are purified. 

1 5 Yet another approach to produce DNA fragments with the length of a gene monomer 

uses the inability of Exonuclease III to cut and remove alpha-thionucleotides (See Putney et 
al., 1 98 1 ; and King and Goodboume, 1 992). When the gene dimer is amplified by PGR using 
dNTPs and a small amount of alpha-thio dNTPs, the alpha-thio-dNTPs are randomly 
incorporated over the entire length of the gene dimer. When the DNA fragments are 

20 subsequently treated with Exonuclease IE, they are truncated to the first thionucleotide on 
each 3'-end. Therefore, the gene dimer is amplified by PGR using an amount of alpha- 
thionucleotides that is adjusted such that the exonuclease and subsequent single-strand- 
specific nuclease treatment will result in DNA fragments which are on average about the size 
of the gene monomer. As described above, gene fragments are then separated and purified. 

25 The purified DNA is treated with a DNA-modifying enzyme, as needed or desired 

(also as described above). See FIGs. IB and 4. For example, a single strand specific nuclease 
or DNA polymerase can be used to convert staggered ends to blimt ends to facilitate 
subsequent steps. 

The protein or polypeptide encoded by the linear construct at this stage would have 
30 a new C-termmus in the second protein and a new N-terminus in the first protein. The linear 
construct may be single stranded or double stranded. In a preferred embodiment, the linear 
construct is double-stranded. According to the invention, the linear DNA fragments are then 
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circularized by e.g. intramolecular blunt-end ligation. See FIG. 5. The 3'-end of the truncated 
gene, originating from the second gene of the original dimer, is fused to the 5*-end of the 
truncated first gene of the dimer. Circularization results in the fusion of the DNA ends 
encoding for tentative new temiini, corresponding to the site marked "X" in FIG. 5. The 
5 position of the ligation site in relation to the linker sequence varies between different 
constructs, as outlined in the figure. The circular DNA fragments are then treated with 
restriction enzymes that cut only within the linker sequences. This eliminates the new termini 
that otherwise would result, and opens the circularized construct in such a way that 
preferentially preserves or reintroduces one or more original termini. Shown in FIG. 5, step 

10 5, are examples of double-stranded linearized constructs, with 4 base-pair overhangs resulting 
from the restriction. When necessary, the DNA fragments can be amplified by PCR using 
PCR-primers that recognize the two original termini. 

If desired, the DNA fragments can also be analyzed by PCR. A PCR reaction that uses 
a primer pair of which one primer is specific for one gene while the other one is specific for 

1 5 the other gene, a product will only be obtained when there is a crossover in the region of the 
two genes that is flanked by the two primers. Alternatively, when using two primers that are 
both specific for one gene, the lack of a product indicates a crossover region. The presence 
or absence of PCR products, therefore, reveals whether the crossover has taken place in a 
specific region or not. 

20 The linear DNA fragments obtained using these techniques comprise a gene library 

according to the invention. 

The fragments can thereafter be ligated into a suitable expression vector. The vector 
is pretreated in such a way that DNA ends are compatible for ligation with the DNA 
fragments, and enable correct transcription of the inserted genes as well as the correct 

25 initiation and termination of its translation. The expression vector might also contain a 
sequence encoding a propeptide or a prepropeptide (e.g. signal sequence) that is necessary for 
the correct localization and/or folding of the protein. Optionally, the expression vector might 
also contain the sequence of a reporter gene, which 5* end has been fused in the same reading 
firame to the ligated genes of the hybrid protein variants. Preferably, the intrinsic start codon 

30 of the reporter gene has been removed, to promote a selection for those gene variants that 
encode the hybrid proteins in one continuous reading frame. The result is a vector gene library 
of hybrid or permuted genes, or fragments encoding for complementary protein fragments. 
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In another embodiment, hybrid polynucleotides in a gene library according to the 
invention, or in a gene library produced according to the invention, can be further mutated by 
any suitable methods known in the art. For example, the entire library, a selected group of 
hybrid genes fh)m the library, or polynucleotides selected from the library can be subjected 
5 to error-prone PCR, methods for introducing point mutations, and/or various DNA shuffling 
techniques known m the art (see, for example Stemmer, 1994 and Zhao et al., 1998). 

The expression vector, e.g. a plasmid, can be \ised to transform a suitable host for 
expression of the proteins, creating a protein library according to the present invention. 
Alternatively, the genes can be expressed in vitro ^ e,g. using an in vitro transcription— 

10 translation system. The resulting hybrid proteins maintain the original N-terminus of the 
second protein and the C-terminus of the fu:st protein - while containing single crossovers 
between the two proteins in between. No artificial linker has to be used to fuse the original 
termini (circular permutation), and the method can therefore be applied to proteins which have 
buried termini. It can also be applied to proteins which have no independently folding 

15 domains, since full-length polypeptide chains are produced. Fxmctional hybrid proteins, 
circular permutants, or complementary fragments, that yield functional protein are preferably 
identified by screening or selection. 

In yet another embodiment of the invention, a modification of the techniques described 
above can be applied to obtain a library of hybrid proteins that have more than one crossover 

20 at structurally related sites. In this procedure, a unique site for a DNA-cleaving enzyme that 
leaves nonidentical ss-DNA protruding ends, for example type II restriction enzymes, is 
introduced beforehand in the linker sequence shown in FIG. 3, After limited DNA digestion, 
isolation of selected gene fragments and construction of circular fragments, constructed as 
outlined in FlGs, 4 and 5, the circularized DNA fragments are cut with these specific 

25 enzymes. See FIG, 6 (section 5). The obtained linear fragments are then ligated to each other 
under conditions that favor intermolecular ligation over intramolecular ligation, in order to 
obtain long concatemers of gene fragments. All genes will be in the right orientation (5'-3':5*- 
3*...) in the concatemers, because the protruding ends on each side are not identical. The 
concatemers are then subjected to another cycle of fragmentation and separation, similar to 

30 that described in FIG 4, to obtain fragments that are approximately the length of a parent gene 
monomer. After creating blunt ends, these new fragments are circularized according to FIG, 
5 . The circular DNA-fragments can be \ised for more rounds of shuffling of different parents 
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and fiagments according to FIG. 6, After the chosen number of shuffling cycles, the 
circularized fragment is cut, for example with restriction enzymes R4 and R5 in FIG, 5, to 
generate a library of linear gene constructs to be fused into expression vectors. 

The corresponding protein library consists of hybrid proteins made of multiple 
5 fragments from the proteins encoded by the original gene dimer. In addition, if this procedure 
is applied to a mixture of heterodimers; or to concatemers of various combinations of the 
genes corresponding to several proteins, for example produced by ligating a mixture of all the 
proteins which have the linker with the appropriate type lis restriction site introduced already, 
gene libraries encoding for hybrid proteins, and corresponding protein libraries, consisting of 

1 0 fragments of multiple parent proteins can be produced. 

In still another embodiment of the invention, the techniques described above can be 
extended to produce hybrid proteins with more than one crossover site, as shown in FIG. 7. 
In this procedure, a second library of single crossover hybrids is obtained similarly as 
described above, with the exception that the two parent proteins are exchanged ("mirror" 

15 library). Thus, the gene that is on the 5' end in one library is on the 3' end in this second 
library. Both hybrid gene libraries can be mixed and used in a conventional DNA-shuffling 
experiment. (See, for example Stemmer, 1994 and Zhao et al., 1998). In the members of the 
shuffled library many crossover sites may be recombined and complete multiple shuffling is 
achieved. 

20 

Examples of practicing the invention are provided, and are understood to be exemplary 
only, and do not limit the scope of the invention or the appended claims. A person of ordinary 
skill in the art will appreciate that the invention can be practiced in many forms according to 
the claims and disclosures here. All polynucleotide and polypeptide sequences referred to in 
25 the Examples are listed in Table 1 and 2, respectively, together with sequence identification 
numbers (SEQ ID NOS). 



TABLE 1 - Nucleotide Sequences 







^.1= ;'?^NiUil£Otid!£;Seqjience^ * 


PI 


1 


TAA GGG GAA CTC GAG ATG AGT AAA GGA 
GAA GAA CTT 
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P2 


2 


CAT CTC GAG TTC CCC TTA TTT GTA TAG TTC 
ATC CAT 


L1&2 


3 


TAA GGG GAA CTC GAG ATG 


P3 


4 


ACC ATG ATT ACG CCA AGC TTG 


P4 


5 


GGG CCC GTA CGG CCG ACT AGT 


Pstp 


6 


p-CGA TTA TTT TTC AGC TAG GCC TAG TTG 
GTA ATG GTA GCG AC ' 


Pla2u 


7 


GCA GGG CCC CAG AGC TCA TGG CTC TGT 
TAT TAG CAG TTT TTC 


Pla2d 


8 


CGC TCT AGA GGT ACC CCA ATT GAT GGA 
GAA GCG CCG C 


Pbm3u 


9 


CGA CGG ATC CGG AAG GAA GGG CCC ATA 
TGA CAA TTA AAG AAA TGC CTC AG 


Pmund 


10 


GCA AAG ACC AAT CGT ATC AAG CG 


Pmunu 


11 


GCT TGA TAC GAT TGG TCT TTG C 


Pnded 


12 


CGT TTA GCA TGT GCG TTA ATA AAT C 


Pndeu 


13 


GAT TTA TTA ACG CAC ATG CTA AAC G 


Pbm3d 


14 


CGT CGG TAC CCT CGA GTG AAG TGC TAG 
GTG AAG GAA TAC C 


pB-lB linker 
sequence 


15 


TTG GGG TAC CTC TAG AAC TAG TGG ATC 
CGG AAG GAA GGG CCC AT 


Pla2i2r 


16 


CAG CTG GGG TCT GTC AGA GAG C 


Pcatc 


17 


CGA CGA TCT AGA TTA CGC CCC GCC CTG 
CCA CTC 


pCWlA2cat 
linker sequence 


18 


TGG CCT GGG TCC CCT GCT 


Pscctu 


19 


CCT GGG TCC CCT GCT AGC GAG AAA AAA 
ATC ACT GGA TAT ACC 
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Pscctd 


20 


GOT ATA TCC AGT GAT TTT TTT CTC OCT AGC 
AGGGGACCCAGG 


PbmSbam 


21 


CCA GGA TCC ATC GAT GCT TAG GAG GTC 
ATA TGA CAA TTA AAG AAA TGC CTC 


RCIN- 
terminal 
sequence 


22 


ATG ACA ATT AAA GAA ATG CCT CAG CCA 
AAA ACG TTT GGA GAG GTG CTC AAG GGT 

TTG AGG 


RC2N- 
tenninal 
sequence 


23 


ATG ACA ATT AAA GAA ATG CCT CAG CCA 
AAA CGT TTG GAG AGC CCC AAA GGC CTG 
AAA AGT 


RC3 crossover 
region 


24 


AGC ATG CGT TTA AAC CGT TTG GAC CCC TCT 
GAGTTC CGGC 


RC4 crossover 
region 


25 


CAG TCA GCA GGC AAT GAA AGG CCC GCC 
GGCGCCTGGCCCA 


RC5 crossover 
region 


26 


TAT CGC CGT GCA GCT TGT TCC AAG GGC 
CGG CCT GAC CTC T 


modified P450 
1A2 


27 


See FIG. 9 


heme domain 
of mutant P450 
BM3 


28 


See FIG. 10 


RCl complete 
sequence 


29 


See FIG. 11 


RC2 complete 
sequence 


30 


See FIG. 12 


RC3 complete 
sequence 


31 


See FIG. 13 


RC4 complete 
sequence 


32 


See FIG. 14 
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RC5 complete 
sequence 


33 


See FIG. 15 


' Sequences are listed in 5'->3' direction if not otherwise indicated. 
* "p" indicates that the 5' end is phosphorylated. 



5 

TABLE 2 - Peptide sequences 



ii. ---.j-: ■ - -.J;:: Mams . Ca^_ ; 






Peptide expressed from pCWl A2cat 
linker sequence 


34 


WPGSPA 


RCl N-terminai sequence 


35 


MTDCEMPQPKTPGEVLKGL 
R 


RC2 N-terminal sequence 


36 


MTIKEMPQPKRLESPKGLKS 



EXAMPLE 1 

15 

Random complementary fragment library of green fluorescent protein 
GFP (Green Fluorescent Protein) is a protein produced by the jellyfish Aequorea 
victoria which fluoresces in the lower green portion of the visible spectrum. This Example 
describes the production of a GFP library suitable for protein fragment complementation. A 
20 gene homodimer consisting of two GFP monomers connected by a linker sequence, was 
constructed. After a limited digestion of the gene dimer, fragments approximating a gene 
monomer in size were retrieved and inserted into an expression vector. The plasmid library 
was thereafter screened to identify fimctional GFP variants. 

Plasmid pGFP containing the complete GFP coding sequence under the lac promoter 
25 (Clontech Laboratories) was xised either intact or modified. This plasmid was transformed 
into E. coli strain XLl-Blue for amplification of the plasmid and for GFP expression. The 
GFP gene monomer consists of 714 bp. 

a) Construction of t^ene homodimer with linker sequence 
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A GFP gene dimer in which a linker sequence was inserted between the two copies 
of the GFP gene was constructed by PGR, Two linking primers were used; PI (forward) 
(SEQ ID NO: 1] and P2 (reverse) [SEQ ID NO: 2]. Each of these linking primers contains 
the same (forward) linker sequence L1&2 [SEQ ID NO: 3]. Another two primers flanking 
5 the GFP gene, and used for PGR, were P3 (forward) [SEQ ID NO: 4], and P4 (reverse) [SEQ 
ID NO: 5]. 

Two PGR reactions were carried out. The fu-st reaction used primers PI and P4, and 
the second used P2 and P3. In both cases, the template was pGFP. The PGR reactions were 
carried out in 100 yuL volume, with 25 cycles of 94°G for 1 minute, 52''G for 40 seconds^ 

10 72**G for 1 minute, with an increment of 1 second each cycle. The PGR products were 
checked by loading 3 /iL of the reaction mixture onto a 1% agarose gel for electrophoresis. 
In both cases, the expected DNA fragment (about 900 bp in size) was found to be the sole 
product. The PGR products were purified using a Qiagen PGR purification kit. The purified 
product from the first reaction was restricted by restriction enzymes Xhol and EcoRI. The 

1 5 product from the second PGR reaction was restricted by Xhol and Pstl. The resulting DNA 
was used in the following three-piece ligation reaction. 

2-3 /ig of plasmid pGFP was restricted with Pstl and EcoRI, and the about 2.6 kbp 
band was purified from an agarose electrophoresis gel using the Qiagen extraction kit. This 
fragment was ligated with the above purified PGR reaction products in a 3-piece ligation 

20 reaction by standard cohesive-end ligation. The ligation mixture was transformed in XLl-Blue 
competent cells by the heat shock method. Transformed cells were plated out onto 
LB/ampicillin plates. Ten colonies were picked at random and grown up 2-3 mL cultures in 
order to purify the plasmid DNA (by mini-prep). All ten colonies picked contained the 
duplicated GFP gene. This new plasmid was designated GFP2x. 

25 b) Generation o f GFP gene fragments from fene dimer 

The GFP gene homodimer was subjected to limited digestion using DNase I. About 
80 jxg of the GFP gene dimer was obtained by restriction of about 200 ug of pGFP2x vnih 
BamHI and EcoRI, and the about 1.5 kbp DNA band was purified from a 1% agarose 
electrophoresis gel. The dimer gene DNA was digested by addmg an appropriate amount of 

30 DNase I (about 30 a^L of 0.0015 U//zL) in 100 fiL reaction mixture in 50 mM Tris-HGl, pH 
7.5/1 mM MnCl2 and 50 /ig/mL BSA for 20-60 min at room temperature. The progress of the 
reaction was checked by agarose gel electrophoresis every 5 minutes. The reaction was 
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Stopped when the digestion products gives an even smear on the gel from about 1 .5 kbp down 
to about 50 bp. Fragments of about 500 bp to about 850 bp in size were purified from a 1% 
agarose electrophoresis gel using the Qiagen DNA extraction kit. The purified DNA (2-4 fxg) 
was treated with 1 U of T4 DNA polymerase in the presence of 0.2 mM of each of the four 
5 dNTPs in T4 polymerase reaction buffer (New England Biolabs). The reaction (25 total 
volume) was allowed to proceed for 15 min at 16°C. The reaction was stopped by addition 
of one volimie of 1 5 mM Tris, pH 7.5 and two volumes of phenol xhloroform (1:1). After two 
more extractions with phenol-chloroform, the DNA was precipitated by ethanol, washed once 
with 70% ethanol, dried and dissolved in 20 /iL water. 

10 c) Construction o f expression plasmid 

A modified pGFP plasmid designated pGFF-stp, to be used as an expression vector 
for the fragmented GFP gene dimers, was prepared. The pGFP-stp plasmid was constructed 
so that stop codons were introduced in ail three reading frames following the GFP-coding 
sequence in pGFP. PGR was used to introduce the stop codons and associated sequence 

15 alterations. Primer Pstp (reverse) was designed to introduce a stop codon in each reading 
frame and a new StuI site. Primer Pstp is 5-prime phosphorylated, and the sequence of Pstp 
is listed m Table I [SEQ ID NO: 6]. 

The PGR used P3 (forward) and Pstp (reverse) with pGFP as the template. Conditions 
were the same as those described above. The PGR product (about 850 bp) was restricted with 

20 Hind HI and purified using the Qiagen kit. The restricted PGR product was ligated with the 
about 2.6 kbp fragment isolated from the digestion of pGFP with Hind III and StuI. The 
ligation mixture was used to transform XLl -Blue competent cells by the heat shock method. 
Since the majority of the colonies contained the pGFP-stp plasmid (as shown by the control 
ligation experiment), and some minor firaction of wild-type pGFP at this stage does not affect 

25 the fmal result, a pool of 60 colonies was used to grow cells for pGFP-stp plasmid 
preparation. 

d) Construction of a plasmid library containing the GFP ^ene fragments. 

After restriction, the pGFP-stp plasmid was dephosphorylated and ligated with the 
GFP gene dimer fragments to form a plasmid library. The pGFP-stp plasmid (2-5 /^g) was 
30 restricted with Smal and StuI at 22 °C for 1 2 hrs and then at 37 ''G for another hour. Theabout 
2.6 kbp fragment was purified from a 0.8% agarose electrophoresis gel using the Qiagen DNA 
extraction kit. The purified DNA (in 20 fj,L dephosphoiylation reaction buffer) was treated 
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with 0.3 U of shrimp alkaline phosphatase (US Biochemical) at 37**C for 30 min. Fresh 
enzyme (0.3 U) was added every 30 min. This 5*-dephosphorylated plasmid vector fragment 
was ligated with the blunt-ended GFP DNA insert (500-850 bp, prepared as described above) 
using a standard blimt-end ligation protocol. The ligation mixture (about 40 /iL) was 
5 transformed into XLl-Blue competent cells. The transformed cells were plated out onto LB 
plates supplemented wdth ampiciilin and IPTG. 



e) Screening for active GFP 
10 The library produced above was subjected to two roimds of screening to identify 

functional GFP fragments. The first was based on fluorescence of the functional protein, and 
the second was based on the restriction digestion pattern of the plasmids. 
Fluorescence . 

Two batches of plates prepared from two separate ligation reactions were screened. 
1 5 A total of about 1 1 ,400 colonies was screened visually by shining 3 66 run UV light briefly on 
each plate with a hand UV lamp (UVP, Model UVGL-58) in a dark room. The colonies that 
emitted green light were marked on the bottom of the plate. A total of 1 84 clones emitting 
green light upon UV illumination were obtained and used for the next round of screening. 
Digestion Pattern . 

20 1 50 well-isolated colonies from the 1 84 green-light emitting colonies were picked and 

used to inoculate 2 mL LB/ampicillin cultures for plasmid preparation. Plasmid DNA mini- 
preparations were carried out for each culture. The purified plasmid DNA was subjected to 
different restriction enzyme digestions. See, FIG. 2A. First, a double digestion with BamHI 
and EcoRI was used to estimate the wild-type GFP background present among the active 

25 clones. The about 1 00 active colonies from the batch of plates made first contained a high 
wild-type GFP content (about 80%). This was consistent with the control experiment from 
this batch, in which the about 2.6 kbp plasmid fragment alone gave rise to a considerable 
background of active GFP colonies grov^ng on the plates. In contrast, the about 50 active 
GFP clones from the plates in the second batch had a very low wild-type GFP background. 

3 0 Furthermore, about 70% contained the unique Xhol site, which does not exist in the wild-type 
GFP plasmid. 
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A total of 43 colonies containing plasmid with the Xhol site were identified. The 
length of the insert was estimated by double digestion with BamHI and Sfil. The digestion 
patterns for all 150 plasmids were analyzed, and the results of the 50 plasmids fi-om the 
second batch of plates are sununarized in FIG, 2B. A large portion of the active GFP- 
5 containing plasmids had a whole insert length greater than the GFP gene size and was found 
to contain the intact GFP gene, either in fi'ont of or following the linker sequence. Recovered 
wildtype or v^ld-type-like GFP genes from the insert library also occured frequently. A few 
inserts were found to be slightly shorter than the intact GFP gene (i.e. lacking both EcoRI and 
Xhol sites). 

10 None of the screened active GFP plasmids contained a split gene with two 

complementary fragments of the whole gene, nor a considerably truncated gene. In contrast, 
of two inactive GFP plasmids chosen at random from the library and subjected to the same 
restriction treatment, one had the Xhol site almost in the middle of the gene. The second was 
found to be a truncated gene. The linker sequence inserted between the genes in the gene 

1 5 dimer was found in different positions in the final library. 

It is estimated that about one-third of the genes in the blxmt-end ligation product (the 
gene fragment or permutation Ubrary) would have the correct reading frame when the method 
of this example is used. For a protein of 300 amino acids, one will need to screen in the order 
of 5 X 10* colonies in order to cover all the diversity of positions for fragmentation at a single 

20 site. 

The wild type background may be due to the presence of the wild type gene after 
purification of the expression plasmid vector. The presence of wild type protein can be 
eliminated or greatly reduced by either of the following approaches. First, during purification 
of the expression plasmid DNA fragment (the about 2.6 kbp fragment), a longer path agarose 
25 gel for electrophoresis can be used to better resolve the desired fragment from the partially 
digested plasmid that still contains the wild type gene. A second and more reliable approach 
is to use a plasmid vector that does not contain this gene in the first place. 

Although this example showed construction of a homodimer, a heterodimer can be 
made from two different parent genes using substantially the same techniques. 

30 



EXAMPLE 2: 
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Hybrid protein library with preserved terminal sequences 
This Example describes the creation of a library of protein hybrids containing 
sequences from two parent proteins; human cytochrome P450 1 A2 and bacterial cytochrome 
P450 BM3. The resulting proteins consist of a single polypeptide chain that have the N- 
5 terminus of the bacterial enzyme and the C-terminus of the mammalian P450. The human 
P450 is membrane-associated, while the bacterial enzyme is soluble. The human P450 is 
active towards a range of aromatic substrates, while the bacterial enzyme prefers long-chain 
fatty acids. The library of hybrid proteins was therefore expected to contain P450s that are 
soluble like the bacterial enzyme, and exhibit the substrate specificity of the human enzyme. 

10 Human cytochrome P450 1 A2 having a modified N-terminus {See Fischer et al., 

1992) was used [SEQ ID NO: 27]. The heme domain of Bacillus Megaterium P450 BM3 
containing the mutation F87A {See Schwaneberg et al., 1999), resulting from four replaced 
nucleotides at position 261 (ATTT to GGCC), was fiuther modified by removing two 
restriction sites (A to G at position 459; T to C at position 711) [SEQ ID NO: 28]. The 

15 modified P450 1A2 gene inserted into the expression vector pCWori (Barnes, 1996) was 
provided by Prof P.P. Guengerich, and the F87A mutant of P450 BM3 inserted into the 
cloning vector pUC 19, was provided by Dr. U Schwaneberg. The size of the P450 1 A2 gene 
monomer used was 1,515 bp, whereas the size of the heme domain of P450 BM3 used was 
1,392 bp. 

20 a) Construction of gene heterodimer with linker sequence 

In this Example, the following restriction sites were used: Sad, Xhol, Xbal, Mfel and 
Ndel, corresponding to R1-R5 in FIG. 3, respectively, A gene heterodimer consisting of 
mammalian and bacterial P450 connected by a linker sequence was constructed. The gene of 
P450 1 A2 was amplified by PGR (referred to as PCR#1) from the vector pCWl A2bc using 
25 the following combination of primers, Pla2u [SEQ ID NO: 7] and Pla2d [SEQ ID NO: 8]. 

Fragments of the gene of the heme domain of P450 BM3 were amplified by PGR 
(referred to as PGR#2-4) from the vector pcmdheme, using the following combinations of 
primer sequences: PGR#2: Pbm3u [SEQ ID NO: 9] plus Pmund [SEQ ID NO: 10]; PGR#3 : 
Pmunu [SEQ ID NO: 11] plus Pnded [SEQ ID NO: 12]; PCR#4: Pndeu [SEQ ID NO: 13] 
30 plusPbm3d[SEQIDNO:14]. 

The fragments from PGR^2-4 were purified after separation on an agarose gel using 
the QiaexII purification kit, combined and used as a template for a PGR (referred to as 
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PCR#5) with the primer pair bin3u [SEQ ID NO: 9] and bm3d [SEQ ID NO: 14], Primer 
Pndeu [SEQ ID NO: 13] and Pnded [SEQ ID NO: 12J contain a mismatch, which removes 
an internal Ndel site in the gene of BM3. Primers Pmunu [SEQ ID NO: 1 1] and Pmund [SEQ 
ID NO: 10] contain a mismatch, which removes an internal Mfel site. The product of PCR#5 
5 therefore encodes the gene of BM3 with two silent mutations that remove restriction sites for 
Ndel andA^^L 

The PGR reactions were carried out in 50 ixl volume, with 30 cycles of 94 *C for 45 
sec, 52°C for 45 sec, 72X for 2 min (PCR#1 and 5), 1 min (PCR#2, 3, 4) using Vent- 
Polymerase (New England Biolabs). The PGR products were separated on a 1% agarose gel 

1 0 and purified using a QiaexII purification kit. The purified PGR# 1 product was then restricted 
by restriction enzymes Sad znAXbal, The PCR#5 product was restricted by Xhol and SamHI. 
Both were ligated into pBluescript n SK (+) (Stratagene) which had been restricted by the 
same pairs of restriction enzymes to yield pB- 1 A2 and pB-BM3 , respectively, (using standard 
ligation procedure). The ligation mixtures were used to transform XLl-Blue cells by 

15 electroporation. Transformed cells were plated out onto LB/ampicillin/Xgal/IPTG plates. 
Ten white colonies were picked from each plate at random, grown up in 5 ml cultures, and 
plasmid DNA was prepared. The DNA sequence of the inserted genes of both pB-lA2 and 
pB-BM3 was checked by sequencing and found to be correct. Both vectors were then 
restricted by and fia/wHI and the reaction products separated on an 1 % agarose gel. The 

20 linearized vector of pB- 1 A2 and the BM3 fragment of pB-BM3 were purified from the gel and 
ligated to obtain vector pB-lB. Vector pB-lB contains the gene heterodimer with the gene of 
1A2 on the 5' end and the gene of BM3 on the 3'-end, separated by a linker sequence listed 
in Table 1 [SEQ ID NO: 15]. E. coli XLl-Blue cells were transformed with pB-lB, cells 
were grown and plasmid DNA was prepared. 

25 b) Generation of P450 gene fragments by DNase I digestion 

About 1 0 yug of pB- 1 B was treated with the restriction enzymes Xhol, Sspl, Seal and AsnL The 
linearized DNA was then subjected to a limited digestion by adding 1.2 i^l DNase I (500 
mu///l) in 120 pt\ reaction in 33 mM TrisHGl, pH 7,5/10 mM MnGl^ and 50 //g/ml BSA for 
60 min at room temperature. The reaction was stopped by addition of 1 3 /^l 0.5 M EDTA and 

3 0 cooling on ice. These conditions had been found to give an even smear on an agarose gel from 
3 kbp down to aboutlOO bp. Fragments of about 1400 bp to about 1600 bp in size were 
purified from a 1% agarose electrophoresis gel using the QiaexII DNA extraction kit. 
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The purified DNA (100 ng) was treated with 1 u of T4 DNA polymerase in the 
presence of 0.2 mM of each of the four dNTPs in T4 polymerase reaction buffer (New 
England Biolabs). The reaction (33 ^1 total volume) was allowed to proceed for 1 5 min at 
16**C. The reaction was stopped by heating to 65 **C for 10 min. The solution was debuffered 
5 using a centriprep spin column. 

c) Generation of P4S0 fragments by Exonuclease III digestion 

An alternative strategy to fi-agment the gene dimers is to use Exonuclease III. {See 
Detailed Description). One of vector pB-lB was digested with 20 u Sspl, and 500 ng of 
vector pACYC184 (New England Biolabs) was digested with 10 u Asnl, followed by 

1 0 treatment with 1 u T4 DNA polymerase in the presence of 0.2 mM of each of the four dNTPs 
in T4 polymerase reaction buffer (New England Biolabs). The reactions were stopped and the 
DNA was concentrated using the QuiaexIIYiit. Both vectors were ligated together and used 
to transform E, coli XLl-Blue cells by electroporation. Six colonies were picked, cells were 
grown, and DNA prepared and analyzed by restriction digestion with Ncol, Two clones 

1 5 showed the correct direction of the two ligated fragments to each other and were named pB- 
exo-*-. pB-exo+ is about 9500 bases long and has a singular restriction site of Eagl roughly 
opposite to the linker sequence which connects the genes of 1 A2 and BM3. 

20 yug of pB-exo+ were linearized by digestion vwth 50 u Eagl, After inactivation of 
the enzyme by heating to 65 **C for 1 0 min, the DNA was precipitated with EtOH/NaAc and 

20 redissolved in 20 TE. In a total volume of 200 ;zl, the DNA was digested with 2000 units 
Exonuclease III at 37^C. After 1 1 min the reaction was stopped and the 5*-3* single strand 
overhangs were removed by adding 750 ^il S 1 -Solution (Exo/S 1 Kit, MBI Fermentas). These 
conditions had been determined to give a smear on an agarose gel fi'om 1 800 bp down to 11 00 
bp. Fragments of about 1400 bp to about 1600 bp in size were purified firom a 1% agarose 

25 electrophoresis gel using the Qiaexll DNA extraction kit. 

d) Circuiarization of the gene fragments to obtain full length genes 

The gene fragments obtained by the methods described in b and c were circularized 
by treatment with 3 Weiss units of T4-DNA Ligase for 20 h at 25 °C in 30 ^\ of 100 mM 
Tris/HCl, pH 7.5, 3 mM DTT, 50 ^M rATP, 10 mM MgCl^. 
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e) Construction of a plasmid library containing the cytochrome P450 hybrids 

An expression vector can be constructed, containing all necessary features for the 
expression of a gene incorporated at the two restriction sites, which are identical to the ones 
in the linker sequence. The start codon can be within the gene fragment (BMS), as well as the 
5 stop codon ( 1 A2). Two additional stop codons can be incorporated into the expression vector 
(in the 3' direction of the two restriction sites) to avoid unnecessary run-off during the 
translation of fragments that are not in the correct reading frame. The vector can be cut with 
the same two restriction enzymes and ligated with the DNA fragments from above, using 
standard procedures for sticky end ligation {See Sambrook et al., 1 989). The ligation mixture 
1 0 can be used to transform a suitable host for protein expression. 

f) Analysis of library 

About 100 transformants can be randomly picked and used as a template for colony 
PCR using the primer pair BM3 forward (Pbm3u [SEQ ID NO: 9]) and 1 A2 reverse (Pla2d 
[SEQ ID NO: 8]). The products of the reactions can be analyzed by agarose gel 

1 5 electrophoresis. Only clones that have a full-length hybrid gene can show a band with the size 
of about 1 .5 kbp. In a second colony PCR two primers that bind to the expression vector but 
flank the inserted gene fragments can be used. Again, the products of these reactions can be 
analyzed by agarose gel electrophoresis. These two experiments can provide the percentage 
of clones which contain a P450 fragment insert and the percentage of those which contain a 

20 hybrid of BM3 and 1 A2. By restriction digestion of the PCR products of selected clones 
containing hybrid genes, the position of the crossover point is narrowed down. For this 
experiment, restriction enzymes that cause different restriction patterns in the genes of BM3 
and I A2 but that do not cut more than 3-5 times can be used. Alternatively, tile crossover 
point can be narrowed down by nested PCR using internal primers. 

25 g) Screening for active P450 

The library can be analyzed for active P450 variants by coexpressing a P450 reductase 
using a standard protocol {See Chang and Waxman, 1998). One third of the library is 
expected to contain genes that are in the correct reading frame over their entire length. 

30 EXAMPLES 



Hybrid pr tein library with preserved terminal sequences 
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Using techniques similar to those described in Example 2, a library of protein hybrids 
was created, containing sequences from the same two parent proteins, modified human 
cytochrome P450 1 A2 [SEQ ID NO: 27) and an F87A mutant of B. Megaterium cytochrome 
P450 BM3 [SEQ ID NO: 28] . The resulting proteins consist of a single polypeptide chain and 
have the N-terminus of the bacterial enzyme and the C-terminus of the human (mammalian) 
P450. The mammalian P450 is membrane associated, while the bacterial enzyme is soluble. 
The mammalian P450 is active towards a range of aromatic substrates, while the bacterial 
enzyme prefers long-chain fatty acids. The library of hybrid proteins was screened for P450 
proteins that are soluble like the bacterial enzyme, and exhibit the substrate specificity of the 
mammalian enzyme. 

a) Construction of gene heterodimer with linker sequence 

Gene heterodimers were constructed as described in Example 2, section a), 

b) Generation of P450 gene fragments by DNase I digestion and Sl-nuclease 
treatment 

About 100 \ig of pB-lB {see Example 2) was digested with the restriction enzymes 
Xhol, Sspl, Sad and Asnl and subsequently desalted. About 25 \xg of that DNA was then 
digested by adding 2.5 ^il DNase I (500 mu/|il) in 300 ^l reaction in 33 mM Tris/HCl, pH 
7.5/10 mM MnClj and 50 ng/ml BSA for 15 min at 26°C. The reaction was stopped by 
addition of 1 3 |il 0.5 M EDTA and cooling on ice. After purification of the DNA (to remove 
the DNase) using the QiaexII DNA extraction kit, the DNA was fiirther digested with 35 u 
S 1 -nuclease in 35 ^il 25 mM potassium acetate buffer (pH 4.6) supplemented with 200 mM 
NaCl, 0.9 mM ZnS04, and 4 % glycerol) for 40 mm at 22**C, to make the DNA fragments 
blunt-ended. The resulting fragments were separated on an agarose gel. These conditions had 
been foimd to give an even smear on an agarose gel from 3 kb down to about 100 bp. 
Fragments of about 1400 to about 1600 bp in size were purified from a 1% agarose 
electrophoresis gel using the QiaexII DNA extraction kit. 

c) Generation of P4S0 fragments by Exonuclease HI digestion 

P450 fragments were also generated using an alternative strategy, Exonuclease JR 
digestion, as described in Example 2, section c). Fragments of about 1450 bp to about 1550 
bp in size were purified from a 1% agarose electrophoresis gel using the QiaexII DNA 
extraction kit. 

d) Circularization of the gene fragments to obtain fuU-length genes 
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The gene fragments obtained by the digestion methods described in b) and c) vitit 
circularized by treatment with 3 Weiss xmits of T4-DNA Ligase for 20 h at 25 °C m 30 /zl of 
100 mM Tris/HCl, pH 7.5, 3 mM DTT, 50 rATP, 10 mM MgClj, 

e) Analysis of gene libraries 

Both libraries of circularized gene fragments, /. e, the library obtained by DNase I/S 1 - 
nuclease digestion (the "DNase I library**) and the one obtained by Exonuclease III digestion 
(the "Exo in library"), were restricted with^al and used as PGR templates with the primers 
Pbm3u (SEQ ID NO: 9] and Pla2d [SEQ ID NO: 8]. The PGR reactions were carried out 
in 50 ^l volume, with 30 cycles of 94 for 45 sec, 52 ''C for 45 sec, 72 for 1 min 30sec 
using Vent-Polymerase (New England Biolabs). The purified PGR products were then 
restricted by restriction en2ymes 5amHI and ASal, separated on a 1 % agarose gel, and purified 
using a g/aex// purification kit. Both were ligated into pBluescript n SK (+) which had been 
restricted by the same pairs of restriction enzymes (using standard ligation procedure). The 
ligation mixtures were used to transfonn XL 1 -Blue cells by electroporation. Transformed cells 
were plated out onto LB/ampicillin/ Xgal/IPTG plates. About 50 white colonies of each 
sample were randomly picked and used as a template for colony PGR with the primers Pbm3u 
andPla2d. 

The products of these PGR reactions were analyzed by gel electrophoresis and their 
approximate length determined by comparison with a DNA-standard (Ikb-ladder, Fermentas). 
The two Kbraries each showed an average length of 1 395 ± 1 OObp (DNasel) and 1 430 ± 1 1 Obp 
(ExoIE). Subsequently, the same colonies were used as templates for a PGR analysis to reveal 
approximate positions of the crossover points within the genes. PGR reactions were perfomied 
with the primer pairs Pmunu [SEQ ID NO: 11] and PI A2d [SEQ ID NO: 8]; PNdeu [SEQ 
ID NO: 13] and Pla2d [SEQ ID NO: 8]; and Pla2i2r [SEQ ID NO: 16] and Pla2d [SEQ 
ID NO: 8]. The PGR products were analyzed by agarose gel electrophoresis. 

The fragments of the library created via DNasel digestion contained the crossovers 
evenly distributed over the whole gene (40 % of the cross overs were found in the first third 
of the gene, 20 % in the foUovring sixth, 10 % in the following sixth and 30 % in the last 
third). Sequencing of one randomly chosen variant from the DNase I library revealed a hybrid 
denoted RG3 [SEQ ID NO: 31], containing the first 1 182 nucleotides from BM3, followed 
by nucleotides 1233 to 1512 from 1A2. The crossover section is shown in the Sequence 
Listings (SEQ ID NO: 24]. 



wo 01/30998 



-41 - 



PCT/USOO/29717 



The fragments of the ExoIII library had the crossover in an area of around nucleotide 
No. 500 ± 300. The sequencing of two randomly chosen variants revealed one hybrid denoted 
RC4, with nucleotides 1 -343 from BM3 and nucleotides 370-1 5 12 from 1 A2 ([SEQ ID NO: 
32]; crossover section in (SEQ ID NO: 25]), and one hybrid denoted RC5, with nxicleotides 
5 1-385 fromBM3 and nucleotides 282-15 12 from 1A2 ([SEQ ID NO: 33]; crossover section 
in [SEQ ID NO: 26]). 

In this experiment, the method for producing gene fi^igments as described under 
section b (see above) was therefore more suitable to produce hybrid proteins with crossovers 
distributed along the entire gene. The method described under c (see above) had a more 

10 limited range of the crossover distribution. Thus, the Exo III method may be the digestion 
method of choice if, for example, it is desirable to conserve a larger portion of the N- and/or 
C terminal region of the parent protein(s) due to a particular fimction of that region. Another 
potential reason could be to target the crossover to a specific region that has been identified 
by other methods (e.g. computational methods) as promising to obtain fimctional hybrids. 

15 J) Construction of a vector for the expression of the cytochrome P450 hybrids 

The gene for chloramphenicol acetyl transferase was amplified by PCR from the 
vector pACYCl 84 and using a combination of the primers Pcatc [SEQ ID NO:17] and Pcatn 
[SEQ ID NO: 7] under the foUowmg conditions: 50 ^l, 1 min 95 ''C, 25 cycles of 45 seconds 
at 94 ^'C, 45 seconds at 52 "^C, 1 min 8 seconds at 72 ^C. The PCR product was digested with 

20 Mfel and^al and ligated into the accordingly digested vector pCWl A2. (See Barnes, 1996). 
The ligation mixture was used for the transformation of XL 1 -Blue cells . The resulting plasmid 
(pC W 1 A2cat) contains the gene for chloramphenicol acetyl transferase immediately following 
the gene of 1 A2 which itself has lost its stop codon. Translation of the gene of I A2 produces 
a ftision protein between 1A2 and cat with the linker sequence of WPGSPA [SEQ ID NO: 

25 34], encoded in-between by the nucleotide sequence listed in Table 1 [SEQ ID NO: 18]. 

PCWl A2sccat was digested with Sally treated with Vent polymerase to create blunt 
ends, and was re-ligated to obtain pCWlA2rfcat, This vector is identical to pCWlA2cat 
except for a shift in the reading frame at amino acid 478 of 1 A2. Using a combination of the 
primers Pscctu [SEQ ID NO: 19] and Pscctd [SEQ ID NO: 20], and the Quickchange 

30 mutagenesis kit (Stratagene), the intrinsic start codon of the cat gene in pCWl A2cat and 
pCWl A2rfcat were changed to a codon for serine (ATG AGC) to produce pCWl A2sccat 
and pCWlA2rfsccat. 
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g) Construction of expression libraries and preselection 

The DNase I and Exo IE gene libraries, obtained as described in sections b-d (see 
above), were amplified by PGR using the Primers Pla2d [SEQ ID NO: 8] and PbmSbam 
[SEQIDNO: 21]. 

5 Thereafter, both gene libraries were restricted with Mfel and BamHly purified by gel 

electrophoresis, and ligated into the vector pCWlA2rfsccat. Prior to ligation, the 
pCWl A2rfsccat vector had been treated with Mfel and BamHl to remove the insert (1 A2rf) 
and purified by gel electrophoresis. XL I -Blue cells were transformed with the ligation 
mixtures and plated on LB-Amp agar. About 250,000 clones were obtained for the DNase I- 
1 0 and about 60,000 clones were obtained for the Exo Hi-libraries. Cells were scraped fi-om the 
agar and resuspended in LB-amp medium. Serial dilutions of cells were plated on agar 
consisting of expression medium (TB-medium plus I mM IPTG, 0,5 mM 8-Aminolevulenic 
acid, I mM Thiamine, trace elements) including 40 ng/ml chloramphenicol. 

h) Screening 

15 About 2000 colonies were picked fi-om expression libraries D and E on the 

TB-selection agar and used to inoculate 25 |il TB+ medium (TB including ImM Thiamine 
and trace elements) in 96- well fluorescence microtiter plates. Another 5,000 to 10,000 
colonies were picked in pools of ten per well. The plates were incubated for 20 hours at 30 
X, 270 rpm. Then, 100 ^1 of TB-H- (TB+ incl. ImM IPTG, 0.5 mM 8-Aminolevulenic Acid) 

20 were added and the plates were incubated for another 20 h at 30 °C, 270 rpm. 

To analyze for activity of the variants, 125 ^1 of 25 mM Tris/HCl, pH 7.4, 10 mM 
MgCl2, 100 mM KCl, 5 \iM 7-Ethoxyresorufin were added to each well. (See Chang and 
Waxman, 1998). Fluorescence at 595 nm ± 20nm after excitation at 550 nm ±10 nm was 
measured immediately, and after a 3 hour incubation at 37°C. By subtracting both 

25 measurements, variants were identified that showed an increase in fluorescence due to the de- 
ethylation of 7-ethoxyresorufin, a typical reaction for 1 A2 P450. 

i) Characterization of variants 

Two variants, RCl (SEQ ID NO: 29] and RC2 [SEQ ID NO: 30] were found in the 
library that were active in the de-ethylation of 7-ethoxyresorufin. RCl was sequenced and 
3 0 revealed the N-terminal nucleotide sequence [SEQ ID NO : 22] and corresponding amino acid 
sequence (SEQ ID NO: 35) listed in FIG. 8 and Tables 1 and 2. RC2 was sequenced and 
revealed the N-terminal nucleotide sequence [SEQ ID NO : 23] and corresponding amino acid 
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sequence [SEQ ID NO: 36] listed in FIG. 8 and Tables 1 and 2. In FIG. 8, sequences 
originating from BM3 are in bold type and sequences originating from 1 A2 are in italic type. 

In RCl, the first 15 amino acids of 1A2 have been replaced by the 14 N-terminal 
amino acids of BM3. RCl, therefore, is almost a fiill length 1A2 with a more hydrophilic 
5 N-terminus. RC2 contains the fu-st 44 nucleotides from BM3 but with a deletion of one of the 
A's in the A-quintuplet at nucleotide residues 27-31, This results in the shift in the reading 
frame at amino acid 11 of BM3. The crossover 12 nucleotide residues further downstream 
restores the correct reading frame at amino acid 25 of 1A2. RC2, therefore, also consists 
mainly of 1 A2. 

10 j) Analysis of hybrid variants 

RC 1 and RC2 both were subcloned to remove the cat-fusion from the C-termini, After 
preparing plasmids, the genes of both variants were cut out using BamYH and Mfel, gel 
purified, and ligated into apCWori derivative that reintroduced the native stop codon for 1 A2. 
XL 1 -Blue cells were transformed by the ligation mix and plasmids were purified from the 

15 transformants, verified by restriction analysis, and used to transform DH5a cells. Together 
with 1 A2 wildtype, both variants were then overexpressed in this strain using volumes of 250 
ml TB-H- medium. 

Cellular localization. To analyze the solubility of the variants, the localization of the 
proteins within the DH5a cells was determined. Equal amounts of cells transformed with each 

20 variant were lysed by ultrasonication and centrifuged at 100,000 g for 2 h. The upper two 
thirds of the supernatant were removed and re-centrifuged under the same conditions. Again, 
the upper two thirds were removed and saved as a membrane-fi^e cytosolic fraction. The 
pellet of the first centrifugation was resuspended and saved as the membrane fraction. The rest 
was discarded. Both fractions were analyzed for the content of P450 enzymes using the P450 

25 peak and also for 1A2 activity (de-ethylation of 7-ethoxyresorufin) using an NADPH 
regeneration system (Shimada, 1998) and P450 oxidoreductase from rat in microsomes. 

While basically no wild-type I A2 could be foimd in the cytosolic fraction (less than 
10 nM), RC2 was detected at a concentration of about 120 nM. In addition, the cytosolic 
fraction with RC2 showed a strong activity, while that of 1 A2 was at the detection limit. From 

30 the concentrations in the different samples, a partition of about 1 4 % RC2 in the cytosol could 
be estimated, compared to less than 2 % of wildtype 1 A2. Even though some RCl could be 
detected in the cytoplasm (about 5%), the majority of the protein was still bound to the 
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membrane. In addition, in Western blot analysis, the immxmobiot analysis of the cytosolic 
fractions gave a very strong signal for RC2, a less strong signal for RC 1 , and a barely visible 
signal for 1 A2. Thus, RCl was less soluble than RC2 but more soluble than wild-type 1A2. 

Membrane solubilization. In a second experiment, different amounts of detergents 
(0.5 % sodium cholate plus 0, 0.01, 0.05 or 0.2 % Triton X-100) were used to extract theP450 
enzymes from the membranes. After centrifugation for 1 h (1 00,000 g) the supernatant as well 
as the pellet were analyzed for activity. None of the samples showed activity in the 
supernatant. The re-suspended pellet of I A2 had activity up to 0.05 % Triton X-100, wiiile 
that ofRC 1 and RC2 only had activity up to 0.0 1 % Triton X- 1 00. Westem blot analysis of 
the samples showed that after treatment with 0.5% sodium cholate and 0.05% Triton X-100, 
RCl and RC2 were almost completely solubilized, while the vast majority of 1 A2 was still 
membrane-bound. 

Enzyme activity. The activity of the P450 enzymes was investigated by measuring the 
deethylationof2.5 /zM7-ethoxyresorufininanNADPH regeneration system (5 mMglucose- 
6-phosphate, 2 mM NADP+, and 0.6 u/L glucose-6 phosphatase dehydrogenase). The specific 
activity of both RCl and RC2 was approximately 50%±10% of the specific activity of wild- 
type P450 1 A2. However, due to the higher solubility of the chimeras, the total activity of 
RCl and RC2 in the cytosolic fractions was approximately 1 .5 and 7.5 times that of wild-type 
P450 1A2, respectively. 

This example thus demonstrates a successful application of the invention. From a 
library of 2000 variants of hybrid proteins constructed from the parents BM3 and 1 A2, two 
variants were found that have (i) an N-terminal portion of BN43 and a C-terminal portion of 
1 A2, (ii) P450 acticity; and (iii) improved solubility compared to the parent 1 A2. 
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WHAT IS CLAIMED IS: 



1 1 . A method for producing a polynucleotide libraiy, comprising the steps of: 

2 (a) preparing a polynucleotide construct comprising at least two parent 

3 polynucleotides connected by a linker sequence; 

4 (b) digesting the construct; 

5 (c) selecting fragments of the digested polynucleotide construct which 

6 approximate a predetermined size; and 

7 (d) circularizing the selected fragments. 

1 2. The method of claim 1, wherein the digestion of the polynucleotide construct 

2 comprises limited digestion with at least one of a DNase and a nuclease. 

1 3 . The method of claim 2, wherein the DNase is DNase 1. 

I 4. The method of claim 2, wherein the nuclease is Exonuclease HI. 

1 5. The method of claim 1, wherein the predetermined size is at least one size that is in 

2 the range from approximately the size of at least one parent polynucleotide to 

3 approximately the size of at least one other parent polynucleotide. 

1 6 . The method of claim 1 , wherein polynucleotide dimer fragments of the predetermined 

2 size are selected by gel electrophoresis. 

1 7. The method of claim 1 , wherein the ends of the polynucleotide dimer fragments are 

2 converted from staggered to blunt ends prior to circularization. 

1 8 . The method of claim 1 , wherein the circularized fragments are linerarized by treatment 

2 with at least one restriction enzyme specific for at least one restriction site within a 

3 linker sequence. 
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' 1 9. The method ofclaim 8, wherein the linearized fragments are inserted into expre^^^ 
2 vectors. 

1 10. A method to produce a protein library, comprising expressing the vectors of claim 9 

2 in an expression system. 

1 11. The method of claim 8, comprising using the linearized fragments as templates for 

2 PCR amplification. 

1 12. The method of claim 1 1 , wherein at least one PCR primer is specific for at least one 

2 of a sequence corresponding an original termini of a polynucleotide and a sequence 

3 located within a linker sequence. 

1 13. The method of claim 12, comprising inserting the PCR product into expression 

2 vectors. 

1 14. A method to produce a protein library, comprising expressing the vectors of claim 1 3 

2 in an expression system. 

1 15. The method in claim 1 , comprising using the circularized fragments as templates for 

2 PCR amplification. 

1 16. The method of claim 15, wherein at least one PCR primer is specific for at least one 

2 of a sequence corresponding to an original termini of a polynucleotide and a sequence 

3 located within a linker sequence 

1 17. The method of claim 16, comprising inserting the PCR product into expression 

2 vectors. 

1 18. A method to produce a protein library, comprising expressing the vectors of claim 1 7 

2 in an expression system. 
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1 1 9. The method of claim 1 , wherein at least one polynucleotide encodes for a membrane- 

2 associated polypeptide. 

1 20. The method of claim 1, wherein at least one parent polynucleotide provides a first 

2 property different from a second property provided by at least one other parent 

3 polynucleotide of the construct. 

1 21. The method of claim 20, wherein the first and second properties are selected from the 

2 group consisting of hydrophilicity, hydrophobicity, foldability, expressability, 

3 substrate specificity, reaction product, and enzyme activity. 

1 22. The method of claim 20, comprising inserting the linearized fragments into expression 

2 vectors. 

1 23 . A method to produce a protein library, comprising expressing the vectors of claim 22 

2 in an expression system. 

1 24. A polynucleotide library produced according to the method in claim 20. 

1 25 . A protein library produced according to the method in claim 23 . 

1 26. The method of claim 8, wiierein the linearized fragments comprise polynucleotide 

2 sequences comprising 

3 (a) an N-terminal sequence providing a property from the N-terminal region of at 

4 least one parent polynucleotide; or 

5 (b) a C-terminal sequence providing a property from the C-terminal region of at 

6 least one parent polynucleotide. 

1 27. The method of claim 26, wherein the property is selected from the group consisting 

2 of hydrophilicity, hydrophobicity, foldability, expressability, substrate specificity, 

3 reaction product, and enzyme activity. 
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28. The method of claim 1, wherein the sequence identity between at least two parent 
polynucleotides is less than 50%. 

29. A polynucleotide library produced by the method in claim 1 . 

30. A protein library produced by the method in claim 10. 

31. A protein library produced by the method in claim 14. 

32. A protein library produced by the method in claim 18. 

33. A method for producing a polynucleotide library, comprising the steps of: 

(a) constructing a first and second parent polynucleotide construct, each 
comprising a region encoding for a polypeptide, an upstream primer and a 
downstream primer, 

wherein one primer of the first polynucleotide construct comprises a 
restriction site for a first restriction enzyme, and the other primer 
comprises a restriction site for a second restriction enzyme, and 
one primer of the second polynucleotide construct comprises a 
restriction site for a third restriction enzyme, and one primer comprises 
a restriction site for the second restriction enzyme; 

(b) cutting the polynucleotide constructs with a mixture of restriction enzymes, the 
first polynucleotide construct being cut with the first and second restriction 
enzymes, and the second polynucleotide construct being cut with the second 
and third restriction enzymes; 

(c) cutting a vector with the first and third restriction enzymes; 

(d) ligating the first polynucleotide construct, the second polynucleotide construct, 
and the vector, to form a vector construct comprising a polynucleotide dimer 
connected by a linker sequence; 

(e) amplifying the vector construct by PCR; 

(f) excising a polynucleotide dimer from the amplified vector by cutting with the 
first and third restriction enzymes; 
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"22 (g) digesting the polynucleotide dimer; 

23 (h) selecting digested polynucleotide dimers of a predetermined size; and 

24 (i) circularizing the selected digested polynucleotide dimers to form a circular 

25 construct. 

1 34. The method of claim 33, wherein the predetermined size is at least one size that is in 

2 the range from approximately the size of at least one parent polynucleotide to 

3 approximately the size of at least one other parent polynucleotide. 

1 35. The method of claim 33, wherein the linker sequence contains a restriction site for a 

2 fourth restriction enzyme. 

1 36. The method of claim 33, wherein the circularized fragments are linearized by 

2 treatment with the fourth restriction enzyme. 

1 37. The method of claim 33, wherein the linearized fragments are inserted into expression 

2 vectors. 

1 38. A method for producing a protein library, comprising expressing the vectors of claim 

2 37 in an expression system. 

1 39. A protein produced by the method in claim 38. 

1 40. A method for producing a polynucleotide library, comprising the steps of: 

2 (a) preparing a polynucleotide construct comprising at least two polynucleotides 

3 connected by a linker sequence comprising at least one restriction site; 

4 (b) digesting the construct; 

5 (c) selecting fragments of the digested polynucleotide construct which 

6 approximate a predetermined size; 

7 (d) ligating the selected fragments to concatemers; 

8 (e) digesting the concatemers; 
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9 (f) selecting fragments of the digested concatemers which approximate the 

10 predetermined size; and 

1 1 (g) circularizing the selected digested concatemers. 

1 41 . The method of claim 40, wherein the digestion of at least one of the polynucleotide 

2 dimer and the concatemers comprises limited digestion with at least one of a DNase 

3 and a nuclease. 

i 42. The method of claim 4 1 , wherein the DNase is DNase L 

1 43. The method of claim 41 , wherein the nuclease is Exonuclease III. 

1 44. The method of claim 40, wherein the predetermined size is at least one size that is in 

2 the range from approximately the size of at least one parent polynucleotide to 

3 approximately the size of at least one other parent polynucleotide. 

1 45. The method of claim 40, wherein at least one of the polynucleotide dimer fragments 

2 and the polynucleotide concatemer fragments of the predetermined size are selected 

3 by gel electrophoresis. 

1 46. The method of claim 40, comprising linearizing the circularized digested concatemer 

2 fragments by treatment with at least one restriction enzyme specific for at least one 

3 restriction site within a linker sequence, 

1 47 , The method of claim 40, comprising inserting the linearized fragments into expression 

2 vectors. 

1 48 . A method to produce a protein library, comprising expressing the vectors of claim 47 

2 in an expression system. 

1 49. A polynucleotide library produced according to the method in claim 40. 
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" 1 50. A protein library produced according to the method in claim 48. 

1 51. A method for producing a polynucleotide libraiy, comprising the steps of: 

2 (a) preparing a first polynucleotide construct comprising at least two parent 

3 polynucleotides connected by a linker sequence comprising at least one 

4 restriction site; 

5 (b) digesting the first construct; 

6 (c) creating a first polynucleotide Ubrary by selecting fragments of the first 

7 digested polynucleotide construct which approximate a predetermined size; 

8 (d) preparing a second polynucleotide construct comprising at least the two parent 

9 polynucleotides connected by a linker sequence comprising at least one 

1 0 restriction site, wherein the polynucleotides are placed in opposite order than 

11 in the first polynucleotide construct; 

1 2 (e) digesting the second construct; 

1 3 (f) creating a second polynucleotide library by selecting fragments of the second 

1 4 digested polynucleotide construct, which approximate a predetermined size; 

15 and 

16 (g) creating a third polynucleotide library by shuffling the first and second 

1 7 polynucleotide library together. 

1 52. The method of claim 5 1 , wherein the predetermined size is at least one size that is in 

2 the range from approximately the size of at least one parent polynucleotide to 

3 approximately the size of at least one other parent polynucleotide. 

1 53 . A method for producing a protein library, comprising the steps of: 

2 (a) preparing a polynucleotide construct comprising at least two parent 

3 polynucleotides connected by a linker sequence, wherein the linker sequence 

4 comprises a restriction site for at least one restriction enzyme; 

5 (b) digesting the construct; 

6 (c) selecting fragments of the digested polynucleotide construct which 

7 approximate a predetermined size; 

8 (d) circularizing the selected fragments; 
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9 (e) linearizing the circular fragments by cutting with the restriction enzyme 

10 (f) inserting the linearized fragments into an expression vector; and 

1 1 (g) expressing the vector in an expression system. 

1 54. The method of claim 53, wherein the digestion of the polynucleotide construct 

2 comprises limited digestion with at least one of a DNase and a nuclease. 

1 55. The method of claim 54, wherein the DNase is DNase 1. 

1 56. The method of claim 54, wherein the nuclease is Exonuclease ID. 

1 57. The method of claim 53, wherein the predetermined size is at least one size that is in 

2 the range from approximately the size of at least one parent polynucleotide to 

3 approximately the size of at least one other parent polynucleotide. 

1 58, The method of claim 53, wherein polynucleotide dimer fragments of the 

2 predetermined size are selected by gel electrophoresis. 

1 59. The method of claim 53, wherein the ends of the polynucleotide dimer fragments are 

2 converted from staggered to blunt ends prior to circularization. 

1 60. A protein produced by a method comprising the steps of: 

2 (a) preparing a polynucleotide construct comprising at least two parent 

3 polynucleotides connected by a linker sequence; 

4 (b) digesting the construct; 

5 (c) selecting a fragment of the digested polynucleotide construct which 

6 approximate the size of at least one parent polynucleotide; 

7 (d) circularizing the selected fragment; 

8 (e) linearizing the circularized fragment by treatment with at least one restriction 

9 enzyme specific for at least one restriction site within the linker sequence; 

10 (f) inserting the linearized fragment into an expression vector; and 

1 1 (g) expressing the vector in an expression system. 
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1 61. A protein expressed from a vector comprising a linearized circular polynucleotide 

2 construct comprising polynucleotide sequences from at least two parent 

3 polynucleotides, wherein the N-terminal of a first parent polynucleotide is linked to 

4 the C-terrainal of a second parent polynucleotide via a linker sequence. 

1 62, The protein expressed from the vector of claim 61, wherein the vector further 

2 comprises a reporter molecule. 

1 63 . The protein expressed from the vector of claim 62, wherein the reporter molecule is 

2 a polynucleotide encoding a reporter protein. 

1 64 . A circular polynucleotide construct comprising polynucleotide sequences from at least 

2 two parent polynucleotides, wherein the N-terminal of a first parent polynucleotide is 

3 linked to the C-terminal of a second parent polynucleotide via a linker sequence. 

1 65. The circular polynucleotide construct of claim 64, wherein the size of the 

2 polynucleotide construct size is at least one size that is in the range from 

3 approximately the size of at least one parent polynucleotide to approximately the size 

4 of at least one other parent polynucleotide. 

1 66. The circular polynucleotide construct of claim 64, wherein the linker sequence 

2 comprises a restriction site for at least one restriction enzyme. 

1 67. The circular polynucleotide construct of claim 64, fiirther comprising a reporter 

2 molecule. 

1 68. The circular polynucleotide construct of claim 67, wherein the reporter molecule is a 

2 polynucleotide encoding a reporter protem. 
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1 69. An expression vector produced by a method comprising the steps of hnearizing the 

2 circular polynucleotide construct in claim 66 by treatment with the restriction enzyme, 

3 and inserting the linearized polynucleotide construct into an expression vector. 

1 70. A protein expressed from the expression vector of claim 69. 

1 71. A circular polynucleotide construct comprising polynucleotide sequences from at least 

2 tv^^o parent polynucleotides, wherein 

3 (a) at least two polynucleotide sequences are connected by a linker sequence; 

4 (b) at least two polynucleotide sequences are truncated; and 

5 (c) the size of the polynucleotide sequences together approximate apredetermined 

6 size. 

1 72. The circular polynucleotide construct of claim 7 1 , wherein the predetermined size is 

2 at least one size that is in the range from approximately the size of at least one parent 

3 polynucleotide to approximately the size of at least one other parent polynucleotide. 

1 73. The circular polynucleotide construct of claim 71, further comprising a reporter 

2 molecule. 

1 74. The circular polynucleotide construct of claim 73, wherein the reporter molecule is a 

2 polynucleotide encoding a reporter protein. 

1 75. The circular polynucleotide construct of claim 71, wherein the linker sequence 

2 comprises a restriction site for at least one restriction enzyme. 

1 76. An expression vector produced by a method comprising the steps of linearizing the 

2 circular polynucleotide construct in claim 75 by treatment with the restriction enzyme, 

3 and inserting the linearized polynucleotide construct into an expression vector. 

1 77. A protein expressed from the expression vector of claim 76. 
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78 . A vector construct comprising a polynucleotide encoding a chimeric protein and a stop 
codon in each of three reading frames positioned proximal to the 3' end of the 
polynucleotide, wherein the polynucleotide comprises one segment encoding the C- 
terminal end of a first parent protein and one segment encoding the N-terminal end of 
a second parent protein. 

79. The vector construct in claim 76, wherein the size of the chimeric protein is at least one 
size that is in the range fi:'om approximately the size of at least one parent protein to 
approximately the size of at least one other parent protein. 

80. The vector constmct of claim 76, wherein the N-terminal end of the first parent protein 
provides a first property dififerent firom a second property provided by the C-terminal 
end of the second parent protein. 

8 1 . The vector construct of claim 78, wherein the first and second properties are selected 
fi-om the group consisting of hydrophilicity, hydrophobicity, foldability, expressability, 
substrate specificity, reaction product, and enzyme activity. 

82. The vector construct of claim 76, wherein at least one parent protein is a membrane- 
associated polypeptide. 

83. The vector construct of claim 76, wherein the sequence identity between at least two 
parent polynucleotides is less than 50%. 

84. The vector construct of claim 76, further comprising a reporter molecule. 

85. The vector construct of claim 84, wherein the reporter molecule is a polynucleotide 
encoding a reporter protein. 



86, 



A library of hybrid proteins comprising polypeptide sequences fi'om at least two parent 
proteins, wherein the hybrid proteins comprise an N-terminal sequence corresponding 
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"3 to the N-tenninal sequence of a first parent protein, and a C-temiinal sequence 

4 corresponding to the C-terminal sequence of a second parent protein. 

1 87. The library of claim 86, wherein the N-terminal sequence provides a first property 

2 different firom a second property provided by the C-terminal sequence. 

1 88. The library of claim 87, wherein the first and second properties are selected fi-om the 

2 group consisting of hydrophilicity, hydrophobicity, foldability, expressability, 

5 substrate specificity, reaction product, and en2yme activity. 

1 89. The library of claim 86, wherein at least one parent protein is a membrane-associated 

2 polypeptide. 

1 90. The library of claim 86, wherein the sequence identity between at least two parent 

2 proteins is less than 50%. 

1 91. The library of claim 86, wherein the size of the hybrid proteins is at least one size that 

2 is in the range firom approximately the size of at least one parent protein to 

3 approximately the size of at least one other parent protein. 

1 92. A protein encoded by a nucleotide sequence selected fi"om the group consisting of 

2 [SEQ ID NO: 29], [SEQ ID NO: 30], [SEQ ID NO: 31], [SEQ ID NO: 32], and 

3 [SEQ ID NO: 33]. 

1 93. A method for producing a gene library, comprising the steps of: 

2 (a) preparing a gene construct comprising at least two parent genes connected by 

3 a linker sequence; 

4 (b) digesting the construct; and 

5 (c) selecting fragments of the digested gene construct which approximate a 

6 predetermined size. 
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1 94. The method of claim 93, wherein at least two parent genes encode for the same 

2 polypeptide sequence. 

1 95. The method of claim 93, wherein at least two genes encode for different polypeptide 

2 sequences. 

1 96. The method of claim 93 , wherein the digestion of the gene construct comprises limited 

2 digestion with DNase 1. 

1 97. The method of claim 93, wherein the predetermined size approximate the size of at 

2 least one parent gene, 

1 99. The method of claim 93, wherein gene dimer fragments of the predetermined size are 

2 selected by gel electrophoresis. 

1 1 00. The method of claim 93, wherein the ends of the digested gene duners are converted 

2 ftom staggered to blxint ends. 

1 101. The method of claim 93, wherein the selected gene fragments are inserted into 

2 expression vectors. 

1 1 02 . A method for producing a protein library, comprising expressing the vectors in claim 

2 8 in a selected expression system. 

1 1 03 . A gene library produced by the method in claim 93 . 

1 104. A protein library produced by the method in claim 102. 

1 1 05 . A method for producing a protein hbrary, comprising the steps of: 

2 (a) preparing a gene construct comprising at least two parent genes connected by 

3 a linker sequence, wherein the linker sequence comprises a restriction site for 

4 at least one restriction enzyme; 
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5 (b) digesting the construct; 

6 (c) selecting fragments of the digested gene construct which approximate a 

7 predetermined size; 

8 (d) inserting the selected fragments into an expression vector; and 

9 (e) expressing the vector in an expression system. 

1 106. The method of claim 105, wherein the digestion of the gene construct comprises 

2 limited digestion with DNase L 

1 1 07. The method of claim 1 05, wherein the predetermined size approximate the size of at 

2 least one parent gene. 

1 1 08. The method of claim 1 05, wherein gene dimer fragments of the predetermined size are 

2 selected by gel electrophoresis. 

1 1 09. The method of claim 1 05, wherein the ends of the selected gene dimer fragments are 

2 converted from staggered to blunt ends. 

1 11 0. A protein library produced by the method in claim 1 05. 

1 111. The protein library in claim 1 1 0, comprising circularly permuted proteins. 

1 11 2. A protein produced by a method comprising the steps of: 

2 (a) preparing a gene construct comprising at least two parent genes connected by 

3 a linker sequence, wherein the two parent genes encode for the same 

4 polypeptide sequence; 

5 (b) digesting the construct; 

6 (c) selecting a fragment of the gene construct which approximate the size of at 

7 least one parent gene; 

8 (d) inserting the selected gene fragment into an expression vector; and 

9 (e) expressing the vector in an expression system. 
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1 113. A hybrid protein produced by a method comprising the steps of: 

2 (a) preparing a gene construct comprismg at least two parent genes connected by 

3 a linker sequence, wherein the two parent genes encode for different 

4 polypeptide sequences; 

5 (b) digesting the construct; 

6 (c) selecting a fragment of the gene construct which approximate the size of at 

7 least one parent gene; 

8 (d) inserting the selected gene fragment into an expression vector; and 

9 (e) expressing the vector in an expression system. 

1 1 14. A method for producing a gene library comprising the steps of: 

2 (a) constructing a first and second parent gene construct, each comprising a region 

3 encoding for a polypeptide, an upstream primer, and a downstream primer; 

4 wherein one primer of the first gene construct comprises a restriction 

5 site for a first restriction enzyme, and one primer comprises a 

6 restriction site for a second restriction enzyme; and 

7 one primer of the second gene construct comprises a restriction site for 

8 the second restriction enzyme, and one primer comprises a restriction 

9 site for a third restriction enzyme; 

1 0 (b) cutting the gene constructs with a mixture of restriction enzymes, the first gene 

1 1 construct being cut with the first and second restriction enzymes, and the 

1 2 second gene construct being cut with the second and third restriction enzymes; 

1 3 (c) cutting a vector with the first and third restriction enzymes; 

14 (d) ligating the first gene construct, the second gene construct, and the vector, to 

15 form a vector construct comprising a gene dimer connected by a linker 

16 sequence; 

1 7 (e) amplifying the vector construct by PCR; 

18 (f) excising a gene dimer from the amplified vector by cutting with the first and 

1 9 third restriction enzymes; 

20 (g) digesting the gene dimer; 

2 1 (h) selecting digested gene dimers of a predetermined size. 
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1 115. The method of claim 114, wherein the two parent genes encode for the same 

2 polypeptide sequence. 

1 116. The method of claim 114, wherein the two parent genes encode for different 

2 polypeptide sequences, 

1 117. The method of claim 1 14, wherein the predetermined size approximate the size of at 

2 least one parent gene. 

1 118. The method of claim 93, wherein at least one parent gene in the construct provides a 

2 first property different from a second property provided by at least one other parent 

3 gene of the construct. 

1 119. The method of claim 118, wherein the first and second properties are selected from 

2 the group consisting of hydrophilicity, hydrophobicity, foldability, expressability, 

3 substrate specificity, reaction product, and enzyme activity. 

1 120. The method of claim 1 05, wherein at least one parent gene in the construct provides 

2 a first property different from a second property provided by at least one other parent 

3 gene of the construct. 

1 121. The method of claim 1 20, wherein the first and second properties are selected from 

2 the group consisting of hydrophilicity, hydrophobicity, foldability, expressability, 

3 substrate specificity, reaction product, and enzyme activity. 
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Figure 1 A. Construction of a gene dimer. Dimer can be homodimer or 
heterodimer. 
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(1) Limited DNase I digestion. 
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j (3) T4 DNA polymerase repairs the staggered 
I ends. 




Figure 1B. Strategy for constructing a library of gene fragments 
corresponding in size to the size of the genes of the original protein(s). 
Starting gene dimer can be a homo- or heterodimer. 
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.Figure 2. Results of screening plasmlds from active GFP clones by 
restriction digestion. A. shows the restriction sites for digestion. B. is 
a summary of the double-enzyme digestion results. BamHI + EcoRI, BamHI 
+ Xhol and BamHI + Sfil. Insert types: a, intact GFP gene with an extra 
fragment upstream; b, intact GFP with an extra fragment downstream; c, 
two overlapped fragments; d, recovered wild-type-like genes; e, 
complementary fragments; f, truncated genes. 
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Figure 3. Construction of a gene heterodimer. 
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^ (1) Limited DNase I digestion. 




^ (2) Gel purify gene monomers. 
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j (3) Repair the staggered DNA ends to create 
I blunt ends. 




Figure 4A; Strategy for constructing a library of gene fragments 
corresponding in size to the size of the genes of the original proteins. 
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(1) Limited DNase I digestion. 



(2) Repair the staggered DNA ends to 
create blunt ends. 
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(3) Gel purify gene mononners. 



Figure 4B. Strategy for constructing a library of gene fragments 
corresponding in size to the size of the genes of the original proteins. 
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^ (4) Circularize by blunt end ligation. 




^ (5) Cut with restriction enzymes R4 and 

cz J X [ P 

X I P 

^ (6) Add expression vector & ligate. 




Figures. Strategy for constructing a library of hybrid proteins with 
one single crossover between the two parent proteins or of a library 
with small interior sequence deletions or duplications, x ... designates 
the position of the cross over between the two proteins. It is created by 
the blunt end ligation. 



7/17 



wo 01/30998 



PCT/USOO/29717 




=0= 
c=Jt 





(5) Cut with restriction enzymes R3. 
0= 



:3J= 



3D= 



(6) Ugate fragments to concatamers. 




(7) Randomly fragment and purify by size. 



^ (8) Circularize by blunt end. ligation. 
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(9) Go to step (5) above for more rounds 
of shuffling or go to step (5). Fig. 3 to 
clone the library in an expression vector. 



Figures. Strategy for constructing a library of hybrid proteins with 
several crossover points between two or more parent proteins. 
X ... designates the position of the cross over between the different 
proteins. 
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(6) Conventional DNA shuffling 



3 




(7) Cut. add expression vector 
& ligate. 



Plasmid library of 
hybrid genes with 
multiple cross overs. 



Figure 7. Another strategy for constructing a library of hybrid proteins with 
several crossover points between two parent proteins. 
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RCl 

nttt 10 20 3 0 AO, 46. ,50 50, . . 

. . ATGACAATTAAAGJUUlTCKICTCAGCCAAAAACGTTTGGAGAGGrGCTCAAGGGrTrC^ 
, .M. .T, .1. .K. .E. .M. .P. .Q. ,P, .K. .T, .P. ,0. ,B. .V. .L. .X. .G. .L. .J?. . 
AA# 5 10 1€ 20 

RC2 

nt# 10 20 30 40. . .75, . .dO 90 

. , ATGACAATTAAAGAAATGCCTCAGCCAAAAC6TTTGGAGAGCCCCAAAGGCCTGAAAAGT 
. .M. .T, .1. ,K, .E. .M. ,P. .Q. .P. .K. . R. .L. .E. .5 . . P. . JC. . G. . L. . X. . S. . 

AA# 5 10 25 30. . 



Figure 8. N-terminal nucleotide and amino acid sequences of two hybrid proteins. 
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INTERNATIONAL SEARCH REPORT 



Inten _ il Application No 

PCT/US 00/29717 



A. CLASSIFICATION OF SUBJECT MATTER , . 

IPC 7 C12N15/10 C12Q1/68 //C07K14/435,C12N9/02 



Aocordmg to International Paieni ClasslTlcallon (IPC) or lo both national classiricailon and IPC 



B. RELOS SEARCHED 



Minimum documentation searched (dassirication system followed by classillcation symbols) 

IPC 7 C12N C12Q 



Oocumenlatlon searched other than mlntmum documentalion to the extent that such documents are included In the fields searched 
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